International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 122 | Views: 191

Research Paper | Computer Science & Engineering | India | Volume 2 Issue 3, March 2013


An Efficient Approach for High Dimensional Data Clustering of Gene Expression using Dynamic Error Threshold Estimation Model

K. Arun Prabha | A. Amutha [2]


Abstract: Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition and bioinformatics. Gene expressions are one of the high dimensional data values and its motivating the development of clustering algorithm was used. The Existing system consists of popular algorithms like k-means and CAST. Implementing these algorithms for a large genome-scale gene expression data set is practically critical. A novel method for clustering large gene data set is introduced. In Existing work the TCLUST algorithm used, which introduce, Correlation Coefficient Graph (CCG) is constructed to maintain gene expression data values and Tanimoto Coefficient Graph (TCG) is used to measure the similarity value for the gene expression data. In proposed the enhanced TCLUST algorithm is used, it is called as E-TCLUST. Enhanced Tanimoto clustering method is implemented which feats the co-connectedness for efficiently clustering large, sparse expression data. Dynamic error threshold estimation model implements threshold values which filters data below the given threshold value. In the proposed work tree structure is constructed represent the input samples. Using graphs the variations are identified. Graph Re-arrangement mechanism is performed which effectively reduces the number of iterations. The process time is also reduced. Extensive evaluation of this method reveals an optimized performance which is depicted as a graph. This algorithm is applied to a genome-scale gene expression data set and used gene set enrichment analysis to obtain highly significant biological clusters. It have been implemented both TCLUST and E-TCLUST algorithms and tested their performance using three different data sets. The datasets are real gene expression data from yeast samples generated using micro-arrays technology.


Keywords: Clustering, Gene Expression, Micro-array, Bio-informatics, Data mining


Edition: Volume 2 Issue 3, March 2013,


Pages: 194 - 196


How to Download this Article?

Type Your Valid Email Address below to Receive the Article PDF Link


Verification Code will appear in 2 Seconds ... Wait

Top