Research Paper | Information Technology | India | Volume 5 Issue 1, January 2016
Bootstrapping in Text Mining Applications
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu
Text mining involves analyzing large corpora of documents with thousands of words with a high level of noise content. Dimensionality reduction, noise mitigation, accurate and stable cluster formation are principal challenges of upstream analytics. This paper proposes a methodology for dimensionality as well as noise reduction using k-fold rotation estimation. Principal Component Analysis enables selecting a reduced set of dimensions (words). The resulting noise-reduced data set is the input to clustering algorithms. Experiments using benchmark data sets from the Brown corpus [5] and real life feedback data of a service provider show that our approach delivers improved results using the well-known performance measures recall, precision, and F-measure [14]. We used combination of projective transforms known as principal component analysis (PCA) and visual scree plot techniques [8, 6, 12] for dimensionality reduction and a k-Fold rotation sampling technique [1] for noise elimination and formation of stable clusters. Experimental results with corpora of different sizes demonstrate that the approach delivers improved clustering accuracy than standard k-means clustering algorithm [2].
Keywords: k-Fold Rotation Estimation, Clustering, k-Means, Principal Component Analysis, Dimensionality Reduction, Precision, Recall, F-Score, Scree Plot
Edition: Volume 5 Issue 1, January 2016
Pages: 337 - 344
How to Cite this Article?
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu, "Bootstrapping in Text Mining Applications", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=NOV152700, Volume 5 Issue 1, January 2016, 337 - 344
112 PDF Views | 99 PDF Downloads
Similar Articles with Keyword 'Clustering'
Research Paper, Information Technology, India, Volume 5 Issue 7, July 2016
Pages: 1920 - 1924Improving Stability, Smoothing and Diversifying of Recommender Systems
Sagar Sontakke, Pratibha Chavan
Research Paper, Information Technology, India, Volume 3 Issue 12, December 2014
Pages: 1896 - 1900Determining and Exploring Dimensions in Subspace Clustering for Value Decomposition
Saranya Sagambari Devi.S
M.Tech / M.E / PhD Thesis, Information Technology, India, Volume 4 Issue 3, March 2015
Pages: 2441 - 2444Privacy-Preservation of Centralized and Distributed Social Network by Using L-Diversity Algorithm
Shankaranand, P. Rajasekar
Survey Paper, Information Technology, India, Volume 6 Issue 3, March 2017
Pages: 1403 - 1405Inverse Problem with Solution Using Data Mining
Ashmikumari Shah, Pooja Jardosh
M.Tech / M.E / PhD Thesis, Information Technology, India, Volume 3 Issue 7, July 2014
Pages: 1743 - 1746Hybrid Approach for Outlier Detection in High Dimensional Dataset
Rohini Balkrishna Gurav, Sonali Rangdale
Similar Articles with Keyword 'k-Means'
Research Paper, Information Technology, India, Volume 4 Issue 4, April 2015
Pages: 988 - 991GPU Accelerated Clustering Techniques
Komal D. Nistane, Shailendra W. Shende
Comparative Studies, Information Technology, India, Volume 5 Issue 4, April 2016
Pages: 2013 - 2019Comprehensive Research on Privacy Preserving Emphasizing on Distributed Clustering
Prajna M.S., Sumana M.
Research Paper, Information Technology, India, Volume 4 Issue 9, September 2015
Pages: 876 - 880Mining GPS Data for Traffic Congestion Detection and Prediction
Suhas Prakash Kaklij
Research Paper, Information Technology, India, Volume 5 Issue 1, January 2016
Pages: 337 - 344Bootstrapping in Text Mining Applications
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu
Research Paper, Information Technology, India, Volume 8 Issue 1, January 2019
Pages: 1055 - 1058Detection of ADHD using Machine Learning Algorithms
Rohit Kale
Similar Articles with Keyword 'Principal Component Analysis'
Research Paper, Information Technology, Kenya, Volume 8 Issue 8, August 2019
Pages: 1825 - 1829The Social Software Learnability Prediction (SSLP) Tool
Masese. B. Nelson
Research Paper, Information Technology, India, Volume 5 Issue 1, January 2016
Pages: 337 - 344Bootstrapping in Text Mining Applications
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu
Similar Articles with Keyword 'Precision'
Research Paper, Information Technology, United States of America, Volume 9 Issue 7, July 2020
Pages: 1087 - 1088Modern Warehouse Operations Execution Using Mobile Devices
Dhanesh Thatikonda
Research Paper, Information Technology, Iraq, Volume 5 Issue 5, May 2016
Pages: 1511 - 1516Status of E-Government in Iraq and What the Challenges of Development and Implementation
Ali Abdulhussian Hassan
Research Paper, Information Technology, India, Volume 5 Issue 1, January 2016
Pages: 337 - 344Bootstrapping in Text Mining Applications
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu
Research Paper, Information Technology, Indonesia, Volume 4 Issue 12, December 2015
Pages: 361 - 364Using Latent Semantic Index for Content-Based Image Retrieval
Andy, Bernardus Ari Kuncoro
Similar Articles with Keyword 'Recall'
Research Paper, Information Technology, India, Volume 5 Issue 1, January 2016
Pages: 337 - 344Bootstrapping in Text Mining Applications
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu
Research Paper, Information Technology, Indonesia, Volume 4 Issue 12, December 2015
Pages: 361 - 364Using Latent Semantic Index for Content-Based Image Retrieval
Andy, Bernardus Ari Kuncoro