Research Paper | Information Technology | India | Volume 5 Issue 1, January 2016
Bootstrapping in Text Mining Applications
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu
Text mining involves analyzing large corpora of documents with thousands of words with a high level of noise content. Dimensionality reduction, noise mitigation, accurate and stable cluster formation are principal challenges of upstream analytics. This paper proposes a methodology for dimensionality as well as noise reduction using k-fold rotation estimation. Principal Component Analysis enables selecting a reduced set of dimensions (words). The resulting noise-reduced data set is the input to clustering algorithms. Experiments using benchmark data sets from the Brown corpus  and real life feedback data of a service provider show that our approach delivers improved results using the well-known performance measures recall, precision, and F-measure . We used combination of projective transforms known as principal component analysis (PCA) and visual scree plot techniques [8, 6, 12] for dimensionality reduction and a k-Fold rotation sampling technique  for noise elimination and formation of stable clusters. Experimental results with corpora of different sizes demonstrate that the approach delivers improved clustering accuracy than standard k-means clustering algorithm .
Keywords: k-Fold Rotation Estimation, Clustering, k-Means, Principal Component Analysis, Dimensionality Reduction, Precision, Recall, F-Score, Scree Plot
Edition: Volume 5 Issue 1, January 2016
Pages: 337 - 344
How to Cite this Article?
C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu, "Bootstrapping in Text Mining Applications", International Journal of Science and Research (IJSR), https://www.ijsr.net/search_index_results_paperid.php?id=NOV152700, Volume 5 Issue 1, January 2016, 337 - 344
Similar Articles with Keyword 'Clustering'
Improving Stability, Smoothing and Diversifying of Recommender Systems
Sagar Sontakke, Pratibha Chavan
Inverse Problem with Solution Using Data Mining
Ashmikumari Shah, Pooja Jardosh
Similar Articles with Keyword 'Precision'
Modern Warehouse Operations Execution Using Mobile Devices
Status of E-Government in Iraq and What the Challenges of Development and Implementation
Ali Abdulhussian Hassan
Similar Articles with Keyword 'Estimation'
Protection Figures Hitting by Refining Depiction Fragment Torrent
P Siddharthan, C Mahesh
Image Denoising Using Standard BP Algorithm
T. Uma Mageswari