Abhishek B. Mankar, Namrata Ghuse
Abstract: Finding Outlier detection in data streams has gained broad importance presently due to the increasing cases of fraud in various applications of data streams, data cleaning, network monitoring, invasive species monitoring, stock market analysis, detecting outlying cases in medical data etc. Finding outliers in a collection of patterns is a very well-known problem in the data mining field. An outlier is a pattern which is dissimilar with respect to the rest of the patterns in the dataset. Proposed Method for outlier detection uses hybrid approach. Purpose of approach is first to apply clustering algorithm that is k-means which partition the dataset into number of clusters and then find outliers from the each resulting clusters using distance based method. The principle of outliers finding depend on the threshold. Threshold is set by user. The main objective of the second stage is a finding out the objects, which are far away from their cluster centroids. In proposed approach, two techniques are combining to efficiently find the outlier from the data set. The experimental results using real dataset demonstrate that proposed method takes less computational cost and performs better than the distance based method. Proposed algorithm efficiently prunes of the safe cells (inliers) and save huge number of extra calculations.
Keywords: Outlier, Inliers, Cluster-based, Distance-based