Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Open Access | Fully Refereed | Peer Reviewed International Journal

ISSN: 2319-7064

Review Papers | Computer Science & Engineering | India | Volume 4 Issue 12, December 2015

Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries

Vrutuja Pande, Pratap Singh

In this paper we describe new adaptive crawling strategies to efficiently locate the entry points to hidden-Web sources and we describe a new hypertext resource discovery system called a Focused Crawler. The fact that hidden-Web sources are very sparsely distributed makes the problem of locating them especially challenging. We deal with this problem by using the contents of pages to focus the crawl on a topic, by prioritizing promising links within the topic, and by also following links that may not lead to immediate benefit. We propose a new framework whereby crawlers automatically learn patterns of promising links and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup and tuning. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using s, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoid and network resources, and helps keep the crawl more up-to-dates we designed two hypertext mining programs that guide our crawler a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, and a distiller that identifies hypertext nodes that are great access points to many relevant pages within a few links, Irrelevant regions of the Web. This leads to significant savings in hardware. Our experiments over real Web pages in a representative set of domains indicate that online learning leads to significant gains in harvest ratesthe adaptive crawlers retrieve up to three times as many forms as crawlers that use a fixed focus strategy.

Keywords: Web resource discovery, Classification, Categorization, Web crawling strategies

Edition: Volume 4 Issue 12, December 2015

Pages: 2212 - 2215

Share this Article

How to Cite this Article?

Vrutuja Pande, Pratap Singh, "Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=NOV152532, Volume 4 Issue 12, December 2015, 2212 - 2215

74 PDF Views | 62 PDF Downloads

Download Article PDF



Similar Articles with Keyword 'Web resource discovery'

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 10, October 2015

Pages: 1331 - 1334

Survey on Crawler for Deep-Web Interfaces

Devendra Hapase, Prof. M. D. Ingle

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 12, December 2015

Pages: 2212 - 2215

Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries

Vrutuja Pande, Pratap Singh

Share this Article

Similar Articles with Keyword 'Classification'

Review Papers, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1936 - 1938

A Mining Method to Predict Patient's DOSH

Ruchi Rathor, Pankaj Agarkar

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 9 Issue 11, November 2020

Pages: 16 - 20

A Comprehensive Review on Intents, Intention Mining and Intention Classification

Saroj S. Date

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 9 Issue 7, July 2020

Pages: 1888 - 1890

Spammer Detection and Identification on Social Network Using Machine Learning

Dr. Shameem Akhter, Noorain Saba

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 1642 - 1643

A Survey Paper on Various Encryption & Data Hiding Methods for Video Streams

Shrutika S. Giradkar, Antara Bhattacharya

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 9, September 2015

Pages: 1233 - 1237

Novel Class Detection for Feature Evolving Data Streams

Harshada Wagaskar, Prof. Gayatri Bhandari

Share this Article

Similar Articles with Keyword 'Categorization'

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014

Pages: 826 - 829

Neural Systems Approach for Ammography Finding by Utilizing Wavelet Features

A. Mallareddy, A. Priyanka

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 2, February 2015

Pages: 2461 - 2466

A Review of Text Mining Techniques Associated with Various Application Areas

Dr. Shilpa Dang, Peerzada Hamid Ahmad

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1812 - 1815

A Survey on Domain Name Categorization Using Artificial Neural Networks

Akshay S. Dhomble, Disha Deotale

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 9, September 2014

Pages: 1121 - 1123

Survey on Categorization and Detection of Adaptive Novel Class of Feature Evolving Data Streams

Chaitrali T. Chavan, Prof. Vinod S. Wadne

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015

Pages: 1573 - 1575

A Service Provider Level SMS Spamming Detection System

Bhavana Alam, Fazeel Zama

Share this Article
Top