International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 127

Research Paper | Computer Science & Engineering | India | Volume 4 Issue 8, August 2015


Study and Analysis on Document Clustering Based on MapReduce in Hadoop using K-Mean Algorithm

Yashika Verma | Sumit Kumari [2]


Abstract: Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics in them, allow search engines to efficiently query large document collections among many other applications. Hence, it has been widely studied as a part of the broad literature of data clustering. MapReduce is a simplified programming model of distributed parallel computing. It is an important technology of Google, and is commonly used for data-intensive distributed parallel computing. In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on cluster of commodity machines. The design and implementation of direct K-Means and Distributed K-means algorithm on MapReduce is presented.


Keywords: Hadoop, Mapreduce, Document Clustering, Direct K-Means, Distributed K-Means, Large DataSet


Edition: Volume 4 Issue 8, August 2015,


Pages: 176 - 180


How to Download this Article?

You Need to Register Your Email Address Before You Can Download the Article PDF


How to Cite this Article?

Yashika Verma, Sumit Kumari, "Study and Analysis on Document Clustering Based on MapReduce in Hadoop using K-Mean Algorithm", International Journal of Science and Research (IJSR), Volume 4 Issue 8, August 2015, pp. 176-180, https://www.ijsr.net/get_abstract.php?paper_id=SUB157223

Similar Articles with Keyword 'Hadoop'

Downloads: 1

Research Paper, Computer Science & Engineering, India, Volume 10 Issue 6, June 2021

Pages: 1188 - 1193

Profit Contribution of Bank Customer from Different Business Liabilities

Vinod Desai | Shalini B Ullagaddi | Vittal A Odeyar

Share this Article

Downloads: 1

Research Paper, Computer Science & Engineering, India, Volume 11 Issue 1, January 2022

Pages: 1229 - 1231

Big Data in Healthcare

Pratiksha Patil

Share this Article
Top