Punam D. Dhande, Dr. A. M. Dixit
Abstract: In the data mining communities, Data stream classification causes number of difficulties. There are four major difficulties present. Those are: 1. Infinite Length 2. Concept-Drift. 3. Concept-Evolution. 4. Feature-Evolution. Since a data stream has hypothetically infinite length, it is unreasonable to store and utilize all the past data for training. Concept-drift is a regular incident in data streams, which takes place as a consequence of modification in the core concepts. Concept-evolution takes place as a consequence of new classes developing in the stream. Feature-evolution is an often happening process in number of streams, for example, text streams, in which new features that is words or expressions, show up as the stream advances. Most of present data stream classification methods tackle merely the initial two difficulties, and disregard the last two. To tackle concept-drift and concept-evolution, an ensemble classification technique can be implemented, in which every classifier is equipped by a novel class detector. A feature set homogenization method can be implemented for tackling feature-evolution. Also the novel class identification module can be improved by making it more versatile to the advancing stream, and empowering it to detect number of novel class at once.
Keywords: Data stream classification, Infinite Length, Concept-Drift, Concept-Evolution, Feature-Evolution