Downloads: 4 | Views: 249 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Informative Article | Computer Science and Information Technology | India | Volume 10 Issue 9, September 2021 | Rating: 5.5 / 10
A Survey of Text Clustering Techniques: Algorithms, Applications, and Challenges
Abstract: Text clustering is one of the fundamental tasks in natural language processing which involves grouping similar documents together based on their content thus enabling efficient organization and analysis of textual data. In this paper we provide a comprehensive survey of text clustering techniques, its applications, challenges, and future directions. We begin by discussing the fundamentals of text clustering, including key concepts such as similarity measures, text feature representations, and clustering algorithms. We also explore popular text clustering algorithms such as K-means, hierarchical clustering, density- based clustering, spectral clustering, affinity propagation, and Latent Dirichlet Allocation (LDA) popularly used for topic modelling. For every algorithm we discuss its methodology, strengths, limitations, and parameter tuning considerations. We also dive deep into the real-world applications of text clustering across diverse domains, including document organization, information retrieval, text summarization, sentiment analysis, and recommendation systems and highlight their effectiveness with case studies and examples. We also identify several challenges and open research questions in text clustering, such as scalability, handling high-dimensional data, incorporating domain specific knowledge in clustering, evaluation metrics, and integration with other NLP tasks such NER, classification, etc. Finally, we propose potential future directions for research to address these challenges in order to advance the field of text clustering. In conclusion, text clustering continues to be an interesting area of research with immense potential for applications in various domains which helps drive innovation in natural language processing.
Keywords: Text clustering, natural language processing, clustering algorithms, document organization, sentiment analysis, scalability
Edition: Volume 10 Issue 9, September 2021,
Pages: 1749 - 1752