Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
www.ijsr.net | Open Access | Fully Refereed | Peer Reviewed International Journal

ISSN: 2319-7064

M.Tech / M.E / PhD Thesis | Computer Science & Engineering | India | Volume 3 Issue 10, October 2014

Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop

Y. K. Patil, Prof. V. S. Nandedkar

Document clustering is one of the important areas in data mining. Hadoop is being used by the Yahoo, Google, Face book and Twitter business companies for implementing real time applications. Email, social media blog, movie review comments, books are used for document clustering. This paper focuses on the document clustering using Hadoop. Hadoop is the new technology used for parallel computing of documents. The computing time complexity in Hadoop for document clustering is less as compared to JAVA based implementations. In this paper, authors have proposed the design and implementation of Tf-Idf, K-means and Hierarchical clustering algorithms on Hadoop.

Keywords: Hadoop, Tf-Idf, Cosine Similarity, K-means and Hierarchical clustering

Edition: Volume 3 Issue 10, October 2014

Pages: 1566 - 1570

Share this Article

How to Cite this Article?

Y. K. Patil, Prof. V. S. Nandedkar, "Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=OCT14526, Volume 3 Issue 10, October 2014, 1566 - 1570

114 PDF Views | 85 PDF Downloads

Download Article PDF



Similar Articles with Keyword 'Hadoop'

Dissertation Chapters, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015

Pages: 1721 - 1725

Secured Load Rebalancing for Distributed Files System in Cloud

Jayesh D. Kamble, Y. B. Gurav

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 12, December 2014

Pages: 1103 - 1108

Design of a High Performing Cloud Using Load Rebalancing Technique in Distributed File System

Y. Steeven, C. Prakasha Rao

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015

Pages: 1096 - 1101

Parallel Data Shuffling for Hadoop Acceleration with Network Levitated Merge and RDMA for Interconnectivity

Kishorkumar Shinde, Venkatesan N.

Share this Article

Survey Paper, Computer Science & Engineering, India, Volume 4 Issue 1, January 2015

Pages: 1690 - 1693

Extended Best Peer: A Peer-to-Peer Based System by Corporate Network for Data Sharing

Chandre P.R, Bhavsar Harshada

Share this Article

Research Paper, Computer Science & Engineering, India, Volume 4 Issue 10, October 2015

Pages: 1646 - 1650

Performance Analysis of Multi-Node Hadoop Clusters using Amazon EC2 Instances

Ruchi Mittal, Ruhi Bagga

Share this Article

Similar Articles with Keyword 'Tf-Idf'

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 5, May 2016

Pages: 1964 - 1967

Improving Performance of Hindi-English based Cross Language Information Retrieval using Selective Documents Technique and Query Expansion

Aditi Agrawal, Dr. A. J. Agrawal

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 7, July 2016

Pages: 1240 - 1244

Implementing K-Means Clustering Algorithm Using MapReduce Paradigm

Botcha Chandrasekhara Rao, Medara Rambabu

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 981 - 984

Using SVM and Stopword removal method in Microblogging Classroom

Vidya Dhuttargaon, Amit R. Sarkar

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 10, October 2014

Pages: 1566 - 1570

Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop

Y. K. Patil, Prof. V. S. Nandedkar

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 5 Issue 1, January 2016

Pages: 710 - 712

World Wide Web Metasearch Using TF-IDF Method

S. P. Phadtare, S. B. Magdum

Share this Article

Similar Articles with Keyword 'Cosine Similarity'

Research Paper, Computer Science & Engineering, India, Volume 5 Issue 5, May 2016

Pages: 1964 - 1967

Improving Performance of Hindi-English based Cross Language Information Retrieval using Selective Documents Technique and Query Expansion

Aditi Agrawal, Dr. A. J. Agrawal

Share this Article

Review Papers, Computer Science & Engineering, India, Volume 4 Issue 4, April 2015

Pages: 981 - 984

Using SVM and Stopword removal method in Microblogging Classroom

Vidya Dhuttargaon, Amit R. Sarkar

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 3 Issue 10, October 2014

Pages: 1566 - 1570

Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop

Y. K. Patil, Prof. V. S. Nandedkar

Share this Article

M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 5 Issue 6, June 2016

Pages: 2206 - 2210

Document Clustering using Improved K-means Algorithm

Anjali Vashist, Rajender Nath

Share this Article

Dissertation Chapters, Computer Science & Engineering, India, Volume 3 Issue 4, April 2014

Pages: 178 - 184

Mining Contents in Web Pages and Ranking of Web Pages Using Cosine Similarity

Divya C.

Share this Article
Top