Downloads: 127 | Views: 275
Research Paper | Computer Science & Engineering | India | Volume 4 Issue 8, August 2015 | Popularity: 6.5 / 10
Study and Analysis on Document Clustering Based on MapReduce in Hadoop using K-Mean Algorithm
Yashika Verma, Sumit Kumari
Abstract: Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics in them, allow search engines to efficiently query large document collections among many other applications. Hence, it has been widely studied as a part of the broad literature of data clustering. MapReduce is a simplified programming model of distributed parallel computing. It is an important technology of Google, and is commonly used for data-intensive distributed parallel computing. In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on cluster of commodity machines. The design and implementation of direct K-Means and Distributed K-means algorithm on MapReduce is presented.
Keywords: Hadoop, Mapreduce, Document Clustering, Direct K-Means, Distributed K-Means, Large DataSet
Edition: Volume 4 Issue 8, August 2015
Pages: 176 - 180
Similar Articles
Downloads: 1
Research Paper, Computer Science & Engineering, India, Volume 10 Issue 6, June 2021
Pages: 1188 - 1193Profit Contribution of Bank Customer from Different Business Liabilities
Vinod Desai, Shalini B Ullagaddi, Vittal A Odeyar
Downloads: 1
Research Paper, Computer Science & Engineering, India, Volume 11 Issue 1, January 2022
Pages: 1229 - 1231Big Data in Healthcare
Pratiksha Patil
Downloads: 3 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Research Paper, Computer Science & Engineering, India, Volume 13 Issue 8, August 2024
Pages: 934 - 939Advanced Computation Techniques for Complex AI Algorithms
Mohammed Saleem Sultan, Mohammed Shahid Sultan
Downloads: 103 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Dissertation Chapters, Computer Science & Engineering, India, Volume 4 Issue 7, July 2015
Pages: 1721 - 1725Secured Load Rebalancing for Distributed Files System in Cloud
Jayesh D. Kamble, Y. B. Gurav
Downloads: 104
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015
Pages: 2114 - 2117Enhanced Document Clustering for Forensic Analysis
Rahul D. Kopulwar, Fazeel Irshad Zama