Downloads: 122

Research Paper | Computer Science & Engineering | India | Volume 2 Issue 3, March 2013

An Efficient Approach for High Dimensional Data Clustering of Gene Expression using Dynamic Error Threshold Estimation Model

Abstract: Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition and bioinformatics. Gene expressions are one of the high dimensional data values and its motivating the development of clustering algorithm was used. The Existing system consists of popular algorithms like k-means and CAST. Implementing these algorithms for a large genome-scale gene expression data set is practically critical. A novel method for clustering large gene data set is introduced. In Existing work the TCLUST algorithm used, which introduce, Correlation Coefficient Graph (CCG) is constructed to maintain gene expression data values and Tanimoto Coefficient Graph (TCG) is used to measure the similarity value for the gene expression data. In proposed the enhanced TCLUST algorithm is used, it is called as E-TCLUST. Enhanced Tanimoto clustering method is implemented which feats the co-connectedness for efficiently clustering large, sparse expression data. Dynamic error threshold estimation model implements threshold values which filters data below the given threshold value. In the proposed work tree structure is constructed represent the input samples. Using graphs the variations are identified. Graph Re-arrangement mechanism is performed which effectively reduces the number of iterations. The process time is also reduced. Extensive evaluation of this method reveals an optimized performance which is depicted as a graph. This algorithm is applied to a genome-scale gene expression data set and used gene set enrichment analysis to obtain highly significant biological clusters. It have been implemented both TCLUST and E-TCLUST algorithms and tested their performance using three different data sets. The datasets are real gene expression data from yeast samples generated using micro-arrays technology.

Keywords: Clustering, Gene Expression, Micro-array, Bio-informatics, Data mining

Edition: Volume 2 Issue 3, March 2013,

Pages: 194 - 196

Microclustering with Outlier Detection for DADC

Aswathy Priya M.

Share this Article

Downloads: 1 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Analysis Study Research Paper, Computer Science & Engineering, India, Volume 12 Issue 11, November 2023

Pages: 1840 - 1846

Analysis of Placement for Electronics and Communication Engineering Students using Multiple Clustering

Dr. Dola Sanjay S

Share this Article

An Efficient Approach for High Dimensional Data Clustering of Gene Expression using Dynamic Error Threshold Estimation Model

Similar Articles with Keyword 'Clustering'

Microclustering with Outlier Detection for DADC

Analysis of Placement for Electronics and Communication Engineering Students using Multiple Clustering