International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Most Trusted Research Journal Since Year 2012

ISSN: 2319-7064



Research Paper | Computer Science & Engineering | India | Volume 3 Issue 7, July 2014

Reducing Duplicate Content Using XXHASH Algorithm

Rahul Mahajan, Dr. S. K. Gupta, Rajeev Bedi

Users of World Wide Web utilize search engines for information retrieval in web as search engines play a vital role in finding information on the web. With the rapid growth of information and the explosion of Web pages from the World Wide Web, it gets harder for search engines to retrieve the information relevant to a user. However, the performance of a web search is greatly affected by flooding of search results with information that is redundant in nature. Removing redundant content is an important data processing operation in search engines and other web applications. The existing architecture of WWW uses URL to identify web pages. A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. Web crawlers rely on URL normalization in order to identify equivalent URLs, which link to the same web pages. Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. De-duping URLs is an extremely important problem for search engines, since all the principal functions of a search engine, including crawling, indexing, ranking, and presentation, are adversely impacted by the presence of duplicate URLs. In this we have proposed a new technique for reducing duplicate content during web crawling and saving only unique pages in the database.

Keywords: Duplicate, Duplicate Content, Normalization, Web, Web Crawler

Edition: Volume 3 Issue 7, July 2014

Pages: 610 - 612


How to Cite this Article?

Rahul Mahajan, Dr. S. K. Gupta, Rajeev Bedi, "Reducing Duplicate Content Using XXHASH Algorithm", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=2014951, Volume 3 Issue 7, July 2014, 610 - 612

27 PDF Views | 24 PDF Downloads

Download Article PDF



Similar Articles with Keyword 'Duplicate'

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 2676 - 2680

Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm

Shital Gaikwad, Nagaraju Bogiri

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 8, August 2015

Pages: 1806 - 1810

A Hybrid Cloud Approach for Secure Authorized Deduplication

Jagadish, Dr.Suvarna Nandyal

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 12, December 2015

Pages: 2223 - 2225

Survey on Fragmentation for Deduplication in Backup Storage

Reshma A. Fegade, R. D. Bharti

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 5 Issue 2, February 2016

Pages: 2217 - 2219

A Hybrid Cloud Approach for Secure Authorized Deduplication

Sunita S. Velapure, S. S. Barde

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 1925 - 1927

Survey on Secured Association Rule Mining in Partitioned Database

Gurpreet Kaur Bhatti, Prof. Ravi Patki

Share this article



Similar Articles with Keyword 'Normalization'

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 3, March 2016

Pages: 1968 - 1973

Face Recognition Revisited on Pose, Alignment, Color, Illumination and Expression-PyTen

Mugdha Tripathi

Share this article

Review Papers, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 1211 - 1215

A Review on Data Mining, Its Applications and Approaches

Anu Verma, Jyoti Arora

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1155 - 1156

A Survey Paper on F-SIFT for Object and Copy Detection

Tanvi Gadgi, Amrit Priyadarshi

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 5, May 2016

Pages: 1636 - 1639

Keyword Based Emotion Word Ontology Approach for Detecting Emotion Class from Text

Ashish V C, Somashekar R, Dr. Sundeep Kumar K

Share this article

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 7, July 2014

Pages: 610 - 612

Reducing Duplicate Content Using XXHASH Algorithm

Rahul Mahajan, Dr. S. K. Gupta, Rajeev Bedi

Share this article



Similar Articles with Keyword 'Web'

Research Paper, Computer Science & Engineering, India, Volume 9 Issue 6, June 2020

Pages: 1151 - 1153

Search Engine Optimization (SEO) Techniques

Kanika Arora

Share this article

Research Paper, Computer Science & Engineering, India, Volume 9 Issue 6, June 2020

Pages: 901 - 905

Low Latency Placement for Effective Fog based Infrastructure

Dr. Varsha Jotwani

Share this article

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 1595 - 1596

User Privacy Protection in Personalised Web Search

Siva B, Merlin Shoerio

Share this article

Review Papers, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 1949 - 1953

A Review: Data Security and Privacy Advancement Approach on Webos with Desktop-As-A-Service

Pooja Saini, Kanchan Narula

Share this article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 7, July 2016

Pages: 1741 - 1743

Crowd Learn (Learning Next Hotskills using the Crowd Source Method)

Girish Kadakol, Pawan Hegde, Manjunath Jeernalli

Share this article



Similar Articles with Keyword 'Web Crawler'

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 1471 - 1474

Web Crawler: Essential Component of Search Engine

Akshada K. Dhakade, Deepak C. Dhanwani

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 5, May 2015

Pages: 857 - 859

Result Grabbing and its Analysis using Data Analytics

Puru Agrawal, Rajesh Deshmukh, Monika Gehi

Share this article

Review Papers, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1191 - 1194

Web Data Extraction by Using Trinity

Sayali Khodade, Nilav Mukharjee

Share this article

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 2108 - 2112

A Survey on User Search Goal Inferring and Re-Ranking System

Hemlata Gaikwad, P. B. Kumbharkar

Share this article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 2463 - 2465

Self-Adaptive Focused Crawler Using Ontology

Pallavi Wadibhasme, Nitin Shivale

Share this article
Top