Rate the Article: Bootstrapping in Text Mining Applications, IJSR, Call for Papers, Online Journal
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064

Downloads: 127 | Views: 383

Research Paper | Information Technology | India | Volume 5 Issue 1, January 2016 | Rating: 7 / 10


Bootstrapping in Text Mining Applications

C. K. Chandrasekhar, M. R. Srinivasan, B. Ramesh Babu


Abstract: Text mining involves analyzing large corpora of documents with thousands of words with a high level of noise content. Dimensionality reduction, noise mitigation, accurate and stable cluster formation are principal challenges of upstream analytics. This paper proposes a methodology for dimensionality as well as noise reduction using k-fold rotation estimation. Principal Component Analysis enables selecting a reduced set of dimensions (words). The resulting noise-reduced data set is the input to clustering algorithms. Experiments using benchmark data sets from the Brown corpus [5] and real life feedback data of a service provider show that our approach delivers improved results using the well-known performance measures recall, precision, and F-measure [14]. We used combination of projective transforms known as principal component analysis (PCA) and visual scree plot techniques [8, 6, 12] for dimensionality reduction and a k-Fold rotation sampling technique [1] for noise elimination and formation of stable clusters. Experimental results with corpora of different sizes demonstrate that the approach delivers improved clustering accuracy than standard k-means clustering algorithm [2].


Keywords: k-Fold Rotation Estimation, Clustering, k-Means, Principal Component Analysis, Dimensionality Reduction, Precision, Recall, F-Score, Scree Plot


Edition: Volume 5 Issue 1, January 2016,


Pages: 337 - 344



Rate this Article


Select Rating (Lowest: 1, Highest: 10)

5

Your Comments (Only high quality comments will be accepted.)

Characters: 0

Your Full Name:


Your Valid Email Address:


Verification Code will appear in 2 Seconds ... Wait

Top