Downloading: Evaluation of Similarities Measure in Document Clustering
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR) | Open Access | Fully Refereed | Peer Reviewed International Journal

ISSN: 2319-7064

To prevent Server Overload, Your Article PDF will be Downloaded in Next Seconds

Evaluation of Similarities Measure in Document Clustering

Hemalatha Immandhi

Abstract: Clustering is a technique of collecting data into subsets in such a manner that identical instances are collected together, at the same time as different instances belong to different groups. The occurrences are thereby organized into an efficient depiction that characterizes the populace being sectioned. Clustering of entities is as earliest as the human need for describing the salient characteristics of mean and objects and identifying them with a style. Consequently, it squeezes a choice of scientific regulations from mathematics and statistics to biology and genetics, the entire of which uses different terms to describe the topologies formed using this analysis. As of biological taxonomies to medical syndromes and genetic genotypes to manufacturing group technology-the problem is same forming groups i.e. cluster text documents that have sparse and high dimensional data objects. Subsequently we originate new clustering criterion functions and corresponding clustering algorithms respectively. Divisive algorithms initiated with just only one cluster that contains all sample data. After that, the single cluster splits into two or more clusters that have higher dissimilarity between them until the number of clusters becomes number of samples or as specified by the user. The most important work is to build up a novel hierarchical algorithm for document clustering which provides maximum efficiency and performance. It is mainly spotlighted in studying and making use of cluster overlapping phenomenon to design cluster merging criteria. Recommending a new method to compute the overlap rate in order to improve time efficiency and the veracity is mainly concentrated. Multi-view learning algorithms characteristically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. The remaining of this paper is ordered.

Keywords: Technology, clustering, Algorithm, data, analysis