Pachunoori Naresh, Garine Bindu Madhavi
Abstract: Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications for instance intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. Though, most anomaly detection methods are typically implemented in batch mode, and so cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. In this article, we propose an online over-sampling principal component analysis (osPCA) algorithm to address this problem, and we plan at detecting the presence of outliers from a large amount of data via an online updating technique. Not like prior PCA based approaches, we do not store the whole data matrix or covariance matrix, and so our approach is especially of interest in online or large-scale problems. Through over-sampling the target instance and extracting the principal direction of the data, the proposed osPCA permit us to determine the anomaly of the target instance according to the variation of the resulting dominant eigenvector. While our osPCA need not perform eigen analysis explicitly, the proposed framework is privileged for online applications which have computation or memory limitations. Match up with the well-known power method for PCA and other popular anomaly detection algorithms.
Keywords: Anomaly detection, online updating, least squares, over-sampling, principal component analysis