International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Open Access | Fully Refereed | Peer Reviewed

ISSN: 2319-7064


Downloads: 106

Review Papers | Computer Science & Engineering | India | Volume 4 Issue 12, December 2015


Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries

Vrutuja Pande, Pratap Singh


Abstract: In this paper we describe new adaptive crawling strategies to efficiently locate the entry points to hidden-Web sources and we describe a new hypertext resource discovery system called a Focused Crawler. The fact that hidden-Web sources are very sparsely distributed makes the problem of locating them especially challenging. We deal with this problem by using the contents of pages to focus the crawl on a topic, by prioritizing promising links within the topic, and by also following links that may not lead to immediate benefit. We propose a new framework whereby crawlers automatically learn patterns of promising links and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup and tuning. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using s, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoid and network resources, and helps keep the crawl more up-to-dates we designed two hypertext mining programs that guide our crawler a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, and a distiller that identifies hypertext nodes that are great access points to many relevant pages within a few links, Irrelevant regions of the Web. This leads to significant savings in hardware. Our experiments over real Web pages in a representative set of domains indicate that online learning leads to significant gains in harvest ratesthe adaptive crawlers retrieve up to three times as many forms as crawlers that use a fixed focus strategy.


Keywords: Web resource discovery, Classification, Categorization, Web crawling strategies


Edition: Volume 4 Issue 12, December 2015,


Pages: 2212 - 2215


How to Cite this Article?

Vrutuja Pande, Pratap Singh, "Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries", International Journal of Science and Research (IJSR), https://www.ijsr.net/get_abstract.php?paper_id=NOV152532, Volume 4 Issue 12, December 2015, 2212 - 2215

How to Share this Article?

Enter Your Email Address


Similar Articles with Keyword 'Classification'

Downloads: 1

Research Paper, Computer Science & Engineering, India, Volume 10 Issue 7, July 2021

Pages: 421 - 424

Comparative Analysis of AI Techniques in the Prediction of Heart Disease

Irtiqa Dhar

Share this Article

Downloads: 2

Research Paper, Computer Science & Engineering, India, Volume 10 Issue 6, June 2021

Pages: 863 - 865

Detection of Malicious URLs using Classification Algorithm

Muskan V. Jaiswal, Dr. Anjali B. Raut

Share this Article

Similar Articles with Keyword 'Categorization'

Downloads: 100 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Review Papers, Computer Science & Engineering, India, https://www.ijsr.net/issue.php?edition=Volume%204%20Issue%202,%20February%20201

Pages: 2461 - 2466

A Review of Text Mining Techniques Associated with Various Application Areas

Dr. Shilpa Dang, Peerzada Hamid Ahmad

Share this Article

Downloads: 100

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015

Pages: 988 - 991

Filtering Unwanted Post from Online Social Networking (OSN) Sites

Sachin P. Vidhate, Syed Akhter

Share this Article

Similar Articles with Keyword 'Web'

Downloads: 1

Student Project, Computer Science & Engineering, India, Volume 10 Issue 6, June 2021

Pages: 1717 - 1724

Krashi Prabhandak (Agricultural Manager)

Prafful Mundra, A V Pavan Krishna, Swarnalatha P, Venkata Sumanth Kakollu

Share this Article

Downloads: 1

Research Paper, Computer Science & Engineering, India, Volume 10 Issue 7, July 2021

Pages: 915 - 920

Calibration Software: Performance Analysis

Prasad Rajendra Kumbhar, Anil R. Surve, Shailender Shekhawat

Share this Article

Similar Articles with Keyword 'resource'

Downloads: 156

Research Paper, Computer Science & Engineering, India, Volume 6 Issue 11, November 2017

Pages: 338 - 384

Managing Uncertainty in Supply Chain Operating Cost Using Genetic Algorithm

Dr. Niju P. Joseph, Dr. Priyanka Surendran

Share this Article

Downloads: 180

Survey Paper, Computer Science & Engineering, India, Volume 7 Issue 1, January 2018

Pages: 81 - 84

Novel Approach to Virtual Machine Migration In Cloud Computing Environment - A Survey

Priyanka H, Dr. Mary Cherian

Share this Article

Similar Articles with Keyword 'discovery'

Downloads: 40

Research Paper, Computer Science & Engineering, China, Volume 9 Issue 4, April 2020

Pages: 1544 - 1554

Recent Developments on Probabilistic Graphical Model Applied in Data Analysis

Kan'Sam Nadjak, Guisheng Yin

Share this Article

Downloads: 102

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 2734 - 2838

Efficient Techniques for Mining High Utility Itemsets from Transactional Databases: A Survey

Ganesh Sawant, Bhawana Kanawde

Share this Article
Top