International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 111 | Views: 213

Research Paper | Software Engineering | India | Volume 3 Issue 2, February 2014


An Ensemble Framework for Web Content Extraction to User Query Obfuscations

Umarani. P. M | Sumathi. P


Abstract: As the dynamic exploration of digital data contents generated on the Web, Users of Web search engines are often forced to sift through the long ordered list of results returned by the engines for obfuscated queries. Data stream classification poses many challenges to the web mining community with challenges like infinite length, concept-drift, Concept-evolution, and feature-evolution, data semantics. Since a data stream is theoretically infinite in length, it is impractical to store and use all the historical data for training. Most existing data stream classification techniques fails to classify the data with less entropy. The proposed framework includes another two components: 1) multi Correlation extraction model is proposed to perform query prediction based annotation similarity, it also check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records.2) We introduce User-specific preference modeling to map the query relevance and user preference into the same user-specific cluster space. The advantages of this method are that it can extract any types of data records provides options for aligning iterative and disjunctive data items. Experimental results show that proposed system achieves high precision and outperforms existing state-of-the-art data extraction methods.


Keywords: Information Retrieval, Data clustering, Data Prediction, Web Data extraction


Edition: Volume 3 Issue 2, February 2014,


Pages: 431 - 435


How to Download this Article?

Type Your Valid Email Address below to Receive the Article PDF Link


Verification Code will appear in 2 Seconds ... Wait

Top