Kavya S.A, M.V.Panduranga Rao, S.Basavaraj Patil
Abstract: The data generated by conventional categorical data clustering is incomplete because the information provided is also incomplete. This project presents a new link-based approach, which improves the categorical clustering by discovering unknown entries through similarity between clusters in an ensemble. A graph partitioning technique is applied to a weighted bipartite graph to obtain the final clustering result. So the link-based approach outperforms both conventional clustering algorithms for categorical data and well-known cluster ensemble technique. Data clustering is one of the fundamental tools we have for understanding the structure of a data set. It plays a crucial, foundation role in machine learning, data mining, information retrieval and pattern recognition. The experimental results on multiple real data sets suggest that the proposed link-based method almost always outperforms both conventional clustering algorithms for categorical data and well-known cluster ensemble technique. This paper proposes an Algorithm called Weighted Triple-Quality (WTQ), which also uses k-means algorithm for basic clustering. Once using does the basic clustering consensus functions we can get cluster ensembles of categorical data. This categorical data is converted to refined matrix.
Keywords: Clustering, categorical data, cluster ensembles, link-based similarity, data mining