Reducing Duplicate Content Using XXHASH Algorithm
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Open Access | Fully Refereed | Peer Reviewed International Journal

ISSN: 2319-7064



Downloads: 116

Research Paper | Computer Science & Engineering | India | Volume 3 Issue 7, July 2014

Reducing Duplicate Content Using XXHASH Algorithm

Rahul Mahajan, Dr. S. K. Gupta, Rajeev Bedi

Users of World Wide Web utilize search engines for information retrieval in web as search engines play a vital role in finding information on the web. With the rapid growth of information and the explosion of Web pages from the World Wide Web, it gets harder for search engines to retrieve the information relevant to a user. However, the performance of a web search is greatly affected by flooding of search results with information that is redundant in nature. Removing redundant content is an important data processing operation in search engines and other web applications. The existing architecture of WWW uses URL to identify web pages. A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. Web crawlers rely on URL normalization in order to identify equivalent URLs, which link to the same web pages. Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. De-duping URLs is an extremely important problem for search engines, since all the principal functions of a search engine, including crawling, indexing, ranking, and presentation, are adversely impacted by the presence of duplicate URLs. In this we have proposed a new technique for reducing duplicate content during web crawling and saving only unique pages in the database.

Keywords: Duplicate, Duplicate Content, Normalization, Web, Web Crawler

Edition: Volume 3 Issue 7, July 2014

Pages: 610 - 612

Share this Article

How to Cite this Article?

Rahul Mahajan, Dr. S. K. Gupta, Rajeev Bedi, "Reducing Duplicate Content Using XXHASH Algorithm", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=2014951, Volume 3 Issue 7, July 2014, 610 - 612

Enter Your Email Address




Similar Articles with Keyword 'Duplicate'

Downloads: 93

Informative Article, Computer Science & Engineering, India, Volume 9 Issue 12, December 2020

Pages: 85 - 88

CBCD Methods in Video Copy Detection

Jan Mary Thomas

Share this Article

Downloads: 99

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1147 - 1150

Enhance QoS of Lossy Wireless Sensor Network by In-Network Data Aggregation

Harshada Kupade, Madhav Ingle

Share this Article

Similar Articles with Keyword 'Normalization'

Downloads: 106

Review Papers, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 1211 - 1215

A Review on Data Mining, Its Applications and Approaches

Anu Verma, Jyoti Arora

Share this Article

Downloads: 108

Research Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1891 - 1895

Image Normalization Robust using Histogram Equalization and Logarithm Transform Frequency DCT Coefficients for Illumination in Facial Images

Dr. V. S. Manjula

Share this Article

Similar Articles with Keyword 'Web'

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Research Paper, Computer Science & Engineering, India, Volume 10 Issue 6, June 2021

Pages: 1240 - 1249

Secure Methods for Supplychain Management to Protect from Attacks in Blockchain

B. Ratnakanth, K. Venkata Ramana

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Student Project, Computer Science & Engineering, India, Volume 10 Issue 6, June 2021

Pages: 1717 - 1724

Krashi Prabhandak (Agricultural Manager)

Prafful Mundra, A V Pavan Krishna, Swarnalatha P, Venkata Sumanth Kakollu

Share this Article

Similar Articles with Keyword 'Web Crawler'

Downloads: 68

Survey Paper, Computer Science & Engineering, India, Volume 9 Issue 12, December 2020

Pages: 890 - 894

A Survey on Types of Crawlers and Web Searching Algorithms

T. Yogameera, Dr. D. Shanthi

Share this Article

Downloads: 97

Review Papers, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1191 - 1194

Web Data Extraction by Using Trinity

Sayali Khodade, Nilav Mukharjee

Share this Article



Top