Research Paper | Computer Science & Engineering | India | Volume 3 Issue 11, November 2014
An Ensemble Classification Framework to Evolving Data Streams
Naga Chithra Devi. R
Abstract: Data stream classification poses many challenges to the data mining community. In this thesis, we address four such major challenges, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Since a data stream is theoretically infinite in length, it is impractical to store and use all the historical data for training. Concept-drift is a common phenomenon in data streams, which occurs as a result of changes in the underlying concepts. Concept-evolution occurs as a result of new classes evolving in the stream. Feature-evolution is a frequently occurring process in many streams, such as text streams, in which new features (i. e. , words or phrases) appear as the stream progresses. Most existing data stream classification techniques address only the first two challenges, and ignore the latter two. In this thesis, we propose an ensemble classification framework, where each classifier is equipped with a novel class detector, to address concept-drift and concept-evolution. To address feature-evolution, we propose a feature set homogenization technique. We also enhance the novel class detection module using the Principle component analysis by making it more adaptive to the evolving Stream and enabling it to detect more than one novel class at a time with heterogeneous technique for novel datas. Comparison with state-of-the-art data stream classification techniques establishes the effectiveness of the proposed approach.
Keywords: Information Retrieval, Data Classification, Outlier Detection, Novel Data extraction
Edition: Volume 3 Issue 11, November 2014,
Pages: 10 - 14
Similar Articles with Keyword 'Information Retrieval'
Application of Elman Back Propagation Neural Network for Automatic Identification of Tabla Strokes in North Indian Classical Music
Shambhavi Shete | Saurabh Deshmukh