International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 124 | Views: 164

M.Tech / M.E / PhD Thesis | Computer Science & Engineering | India | Volume 4 Issue 2, February 2015


Efficient Way of Determining the Number of Clusters Using Hadoop Architecture

Siri H. P. | Shashikala.B


Abstract: The process of data mining is to extract information from a data set and transform it into an understandable structure. The clustering task plays a very important role in many areas such as exploratory data analysis, pattern recognition, computer vision, and information retrieval. The key idea is to view clustering as a supervised classification problem, in which we estimate the true class labels. The problem of determining the valid number of clusters is not easy. To overcome this problem many well known methods are used to find a correct number of clusters i. e. Gap statistic, Path based clustering and Figure of Merit (FOM) but these methods could not solve the problem of finding number of clusters efficiently. This paper focuses on Average Intracluster Distance index to validate the estimated number of arbitrary shaped clusters. In hadoop the proposed technique is based on the local relations between patterns and their clustering labels which makes use of Minimum Spanning Tree (MST) algorithm based on the multiplicity property of MST to get accurate results in efficient manner.


Keywords: Minimum Spanning Tree MST, Gap statistic, IC-av


Edition: Volume 4 Issue 2, February 2015,


Pages: 633 - 638


How to Download this Article?

Type Your Valid Email Address below to Receive the Article PDF Link


Verification Code will appear in 2 Seconds ... Wait

Top