Downloads: 116
Research Paper | Computer Science & Engineering | India | Volume 3 Issue 7, July 2014
Reducing Duplicate Content Using XXHASH Algorithm
Rahul Mahajan | Dr. S. K. Gupta | Rajeev Bedi
Abstract: Users of World Wide Web utilize search engines for information retrieval in web as search engines play a vital role in finding information on the web. With the rapid growth of information and the explosion of Web pages from the World Wide Web, it gets harder for search engines to retrieve the information relevant to a user. However, the performance of a web search is greatly affected by flooding of search results with information that is redundant in nature. Removing redundant content is an important data processing operation in search engines and other web applications. The existing architecture of WWW uses URL to identify web pages. A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. Web crawlers rely on URL normalization in order to identify equivalent URLs, which link to the same web pages. Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. De-duping URLs is an extremely important problem for search engines, since all the principal functions of a search engine, including crawling, indexing, ranking, and presentation, are adversely impacted by the presence of duplicate URLs. In this we have proposed a new technique for reducing duplicate content during web crawling and saving only unique pages in the database.
Keywords: Duplicate, Duplicate Content, Normalization, Web, Web Crawler
Edition: Volume 3 Issue 7, July 2014,
Pages: 610 - 612
Similar Articles with Keyword 'Duplicate'
Downloads: 95 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Informative Article, Computer Science & Engineering, India, Volume 9 Issue 12, December 2020
Pages: 85 - 88CBCD Methods in Video Copy Detection
Jan Mary Thomas
Downloads: 100
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015
Pages: 1147 - 1150Enhance QoS of Lossy Wireless Sensor Network by In-Network Data Aggregation
Harshada Kupade | Madhav Ingle [8]