Research Paper | Computer Science & Engineering | India | Volume 3 Issue 6, June 2014
Text Studies Classification of Database of Genotypes and Phenotypes using K-Nearest Neighbor Algorithm
Kolekar Suresh S, Kumbhar Satish S
The database of genotypes and phenotypes (dbGaP) is the new database to store and distribute data from studies of genome wide association. dbGaP launch by National Library of Medicine (NLM) which is part of National Institutes of Health (NIH). Searching relevant studies of particular interest accurately and completely is challenging task due to keyword based search method of dbGaP Entrez system. For given queries; the dbGaP retrieval system returns several studies that are unrelated; and it is very difficult to find how particular studies are retrieved and why they come out in a particular sequence. Thus; users have to evaluate every study description carefully to find relevant studies; which is time consuming task. Text mining is emerging research field which enable users to extract useful information from text documents and deals with retrieval; classification; clustering and machine learning techniques to classify different text document. In this research; an empirical approach is proposed and implemented with K-nearest neighbor (KNN) machine learning algorithms to classify dbGaP study text in heart; lung and blood studies. It is evident from results that this text based classification outperforms conventional keyword based search of document retrieval system provided by dbGaP.
Keywords: Bioinformatics, Data Mining, Text Mining, database of Genotypes and Phenotypes
Edition: Volume 3 Issue 6, June 2014
Pages: 1146 - 1149
How to Cite this Article?
Kolekar Suresh S, Kumbhar Satish S, "Text Studies Classification of Database of Genotypes and Phenotypes using K-Nearest Neighbor Algorithm", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=2014436, Volume 3 Issue 6, June 2014, 1146 - 1149
Similar Articles with Keyword 'Bioinformatics'
Performance Analysis of Clustal W Algorithm on Linux Cluster
Swati Jasrotia, Salam Din
Survey on Matrix Factorization Using Information Fusion
Rutuja Mane, A. N. Bandal
Similar Articles with Keyword 'Data Mining'
Predicting the Course Knowledge Level of Students using Data Mining Techniques
Thapaswini P S
Random Forest Based Heart Disease Prediction
Adeen, Preeti Sondhi
Similar Articles with Keyword 'Text Mining'
A Survey of Generating Multi-Document Summarizations
Patil Ajita S., P. M. Mane
A Review of Text Mining Techniques Associated with Various Application Areas
Dr. Shilpa Dang, Peerzada Hamid Ahmad