Downloads: 1
Experimental Research Paper | Computer Engineering | Volume 15 Issue 5, May 2026 | Pages: 940 - 953 | India
Hybrid PCA and K-means (DBSCAN) for Addressing Imbalanced Data: A Framework for Enhancing Machine Learning Performance
Abstract: This research tests a new hybrid preprocessing method on many machine learning models. Logistic Regression, Gradient Boosting, a hybrid Deep Neural Network (DNN), and Random Forest are among these models. The research examines how imbalanced datasets impact these models' performance. The presented preprocessing approach improves model performance using PCA, K-means clustering, and DBSCAN. This is achieved with 99.96% accuracy, 99.92% precision, 100% recall, and a 99.96% F1-score. The hybrid model, which combines a Decision Neural Network (DNN) and a Convolutional Neural Network (CNN), achieves excellent classification skills after preprocessing and significantly reduces misclassification errors on all models. The research shows that the suggested strategy improves minority class categorization accuracy. The importance of preprocessing in machine learning pipelines is highlighted. Based on the results, the suggested hybrid preprocessing strategy outperforms earlier methodologies, providing a strong foundation for improving machine learning prediction performance.
Keywords: Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Imbalanced Data Machine Learning (ML), Principal Component Analysis (PCA), K-means, Synthetic Minority Over-sampling Technique (SMOTE)
How to Cite?: Rajesh Pandey, Dr. Mamta Bansal, Yogesh Awasthi, "Hybrid PCA and K-means (DBSCAN) for Addressing Imbalanced Data: A Framework for Enhancing Machine Learning Performance", Volume 15 Issue 5, May 2026, International Journal of Science and Research (IJSR), Pages: 940-953, https://www.ijsr.net/getabstract.php?paperid=SR26514163038, DOI: https://dx.dx.doi.org/10.21275/SR26514163038