Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Open Access | Fully Refereed | Peer Reviewed International Journal

ISSN: 2319-7064

Dissertation Chapters | Computer Science & Engineering | India | Volume 3 Issue 4, April 2014

Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity

Divya C.

Now a days internet has become a part of life because of which web pages have became a key communication and information medium for various organizations. Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g.; banner ads, navigation bars, copy right and privacy notices, advertisements which are not related to the main content (relevant information). In this paper the system use HTML Parser to construct DOM (Document Object Model) tree from which Content Structure Tree (CST) is constructed which can easily separate the main content blocks from the other blocks. The paper also introduces a method for calculating the rank of a web page based on the content similarity between the web documents and the user query, since usually when the user searches for web pages using a key word many web pages are retrieved the user might not be knowing which web pages are most relevant to overcome this problem the web pages are ranked using Cosine Similarity and Jaccard Similarity. The Cosine Similarity and Jaccard Similarity are implemented with the stop word removal algorithm. Many experiments were conducted for both Cosine Similarity and Jaccard Similarity. The obtained results have been compared to decide which one work best. The result was that Cosine Similarity retrieved most relevant pages to the user than the Jaccard Similarity.

Keywords: Content mining, DOM tree, CST tree, TF-IDF, Cosine Similarity

Edition: Volume 3 Issue 4, April 2014

Pages: 178 - 184

Share this Article

How to Cite this Article?

Divya C., "Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=20131317, Volume 3 Issue 4, April 2014, 178 - 184

120 PDF Views | 98 PDF Downloads

Download Article PDF



Similar Articles with Keyword 'Content mining'

Research Paper, Computer Science & Engineering, India, Volume 6 Issue 5, May 2017

Pages: 1467 - 1471

Improve an Enhanced-Ratio Rank Algorithm based on Reading Time

Jinal V. Patel, Rimi Gupta

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1990 - 1993

The Recommendation System for User Interest on Web

Priyanka D. Khawale, V. S. Nandedkar

Share this Article

Comparative Studies, Computer Science & Engineering, India, Volume 3 Issue 6, June 2014

Pages: 2099 - 2104

Comparative Study of Web Content Mining Techniques for HTML and XML Contents

Rupinder Kaur, Kamaljit Kaur

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 7, July 2014

Pages: 60 - 66

Content and Usage Based Ranking for Enhancing Search Result Delivery

Shital C. Patil, R. R. Keole

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 5 Issue 2, February 2016

Pages: 998 - 1002

Feature Selection for Global Redundancy Minimization Using Regularized Trees

Shweta Satish Shringarputale, P. R. Rathod

Share this Article

Similar Articles with Keyword 'DOM tree'

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 2630 - 2634

A Review on Identifying the Main Content From Web Pages

Madhura R. Kaddu, Dr. R. B. Kulkarni

Share this Article

Dissertation Chapters, Computer Science & Engineering, India, Volume 3 Issue 4, April 2014

Pages: 178 - 184

Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity

Divya C.

Share this Article

Research Paper, Computer Science & Engineering, Sudan, Volume 6 Issue 9, September 2017

Pages: 337 - 342

Intrusion Detection System Using Weka Data Mining Tool

Asma Abbas Hassan, Alaa F. Sheta, Talaat M. Wahbi

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 1, January 2015

Pages: 398 - 401

eDEW: Effective Data Extraction from Web

Shalaka Patil

Share this Article

Research Paper, Computer Science & Engineering, Egypt, Volume 6 Issue 10, October 2017

Pages: 126 - 131

Early Prediction of Student Success Using a Data Mining Classification Technique

Mohamed Hegazy Mohamed, Hoda Mohamed Waguih

Share this Article

Similar Articles with Keyword 'TF-IDF'

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 5, May 2016

Pages: 1964 - 1967

Improving Performance of Hindi-English based Cross Language Information Retrieval using Selective Documents Technique and Query Expansion

Aditi Agrawal, Dr. A. J. Agrawal

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 7, July 2016

Pages: 1240 - 1244

Implementing K-Means Clustering Algorithm Using MapReduce Paradigm

Botcha Chandrasekhara Rao, Medara Rambabu

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 981 - 984

Using SVM and Stopword removal method in Microblogging Classroom

Vidya Dhuttargaon, Amit R. Sarkar

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 5 Issue 1, January 2016

Pages: 710 - 712

World Wide Web Metasearch Using TF-IDF Method

S. P. Phadtare, S. B. Magdum

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 2206 - 2210

Document Clustering using Improved K-means Algorithm

Anjali Vashist, Rajender Nath

Share this Article

Similar Articles with Keyword 'Cosine Similarity'

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 5, May 2016

Pages: 1964 - 1967

Improving Performance of Hindi-English based Cross Language Information Retrieval using Selective Documents Technique and Query Expansion

Aditi Agrawal, Dr. A. J. Agrawal

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 981 - 984

Using SVM and Stopword removal method in Microblogging Classroom

Vidya Dhuttargaon, Amit R. Sarkar

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 2206 - 2210

Document Clustering using Improved K-means Algorithm

Anjali Vashist, Rajender Nath

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 10, October 2014

Pages: 1566 - 1570

Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop

Y. K. Patil, Prof. V. S. Nandedkar

Share this Article

Dissertation Chapters, Computer Science & Engineering, India, Volume 3 Issue 4, April 2014

Pages: 178 - 184

Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity

Divya C.

Share this Article
Top