Downloads: 120 | Views: 170
Review Papers | Computer Science & Engineering | India | Volume 3 Issue 12, December 2014
Anomaly Detection of Online Data using Oversampling Principal Component Analysis
Supriya A. Bagane | J. L. Chaudhari 
Abstract: Anomaly detection is very important topic in data mining and machine learning. This technique is helpful in many real world applications such as intrusion or credit card fraud detection, fault detection in safety critical systems, and military surveillance for enemy activities. Anomaly detection is basically used to find the patterns in data that do not conform to their expected behavior. Such patterns are termed as anomalies, outliers, discordant observations, exceptions, aberrations etc in different application domains. From all these terms anomalies and outliers can be used interchangeably. Outlier detection methods can be used to deal with extremely unbalanced data distribution problems. Most of the anomaly detection methods are implemented in batch mode due to which they cannot be extended to large scale problems. If we extend them to large scale problems, they will result in sacrificing computation and memory requirements. To tackle this problem we proposed oversampling Principal Component Analysis (osPCA) scheme in this paper. This technique aims at detecting the presence of outliers from large amount of data. In previously proposed Principal Component Analysis methods, it is required to store entire data matrix or covariance matrix, but this is not the case with our osPCA approach. So it can be extended to large scale or online problems. Principal Component Analysis is used to find the principal direction of the data and oversampling technique will duplicate the target instance multiple times to amplify the effect of outliers. By oversampling the target instance and extracting the principal directions of the data the osPCA allows us to determine the anomaly in target instance according to the variations in the resulting dominant eigenvector. This online updating technique allows us to efficiently calculate dominant eigenvector without eigen analysis or storing entire covariance matrix. Compared with the other anomaly detection methods the required computational costs and memory requirements are significantly reduced.
Keywords: Anomaly detection, principal Component Analysis, outlier, oversampling
Edition: Volume 3 Issue 12, December 2014,
Pages: 687 - 690