International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 130 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Comparative Studies | Computer Science & Engineering | India | Volume 3 Issue 6, June 2014


Comparative Study of Web Content Mining Techniques for HTML and XML Contents

Rupinder Kaur [12] | Kamaljit Kaur [5]


Abstract: World Wide Web is the rapidly grown source of information. Data on the web is available in many forms which are structured data; unstructured data and semi- structured data. Also it is growing on daily basis. It is becoming difficult to the user to get the relevant data from the web. Data Mining is the subject of computer science which is used to mine useful information from very large amount of data. Web mining is the application of data mining; which implements various techniques of data mining to get the relevant information from the web. Web developers have now started to develop Web pages on emerging Web Technologies like XML; Flash etc. XML was designed to describe data and to focus on what the data is. XML also plays the role of a meta- language and allows authors to create customized markup language for different types of documents; making it a standard data format for online data exchange. To date; famous algorithms like Apriori and FP- Growth algorithms are used to fetch the web data for XML contents and for HTML contents numerous techniques have been proposed. In this paper; a hybrid approach is used to fetch HTML as well as XML contents from a web page. In the hybrid approach; Apriori algorithm is used to remove the unimportant information from the contents and Decision tree is used to fetch the contents from a web page. This hybrid approach is compared with the previous technique implementing FP-Growth algorithm for HTML and XML contents. At the end; results are shown using graphs.


Keywords: Web Mining, XML, Apriori, Decision Tree, FP- Growth algorithm


Edition: Volume 3 Issue 6, June 2014,


Pages: 2099 - 2104


How to Download this Article?

You Need to Register Your Email Address Before You Can Download the Article PDF


How to Cite this Article?

Rupinder Kaur, Kamaljit Kaur, "Comparative Study of Web Content Mining Techniques for HTML and XML Contents", International Journal of Science and Research (IJSR), Volume 3 Issue 6, June 2014, pp. 2099-2104, https://www.ijsr.net/get_abstract.php?paper_id=2014704

Similar Articles with Keyword 'Web Mining'

Downloads: 104

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 8, August 2015

Pages: 1640 - 1647

Privacy Preservation Protection for Personalized Web User by k-Anonymity with Profile Construction for Web Search Engines

Uma Maheswari.T | Dr.V. Kavitha

Share this Article

Downloads: 104

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 11, November 2015

Pages: 1812 - 1815

A Survey on Domain Name Categorization Using Artificial Neural Networks

Akshay S. Dhomble | Disha Deotale

Share this Article
Top