TY - GEN
T1 - Detecting recurring and novel classes in concept-drifting data streams
AU - Masud, Mohammad M.
AU - Al-Khateeb, Tahseen M.
AU - Khan, Latifur
AU - Aggarwal, Charu
AU - Gao, Jing
AU - Han, Jiawei
AU - Thuraisingham, Bhavani
PY - 2011
Y1 - 2011
N2 - Concept-evolution is one of the major challenges in data stream classification, which occurs when a new class evolves in the stream. This problem remains unaddressed by most state-of-the-art techniques. A recurring class is a special case of concept-evolution. This special case takes place when a class appears in the stream, then disappears for a long time, and again appears. Existing data stream classification techniques that address the concept-evolution problem, wrongly detect the recurring classes as novel class. This creates two main problems. First, much resource is wasted in detecting a recurring class as novel class, because novel class detection is much more computationally- and memory-intensive, as compared to simply recognizing an existing class. Second, when a novel class is identified, human experts are involved in collecting and labeling the instances of that class for future modeling. If a recurrent class is reported as novel class, it will be only a waste of human effort to find out whether it is really a novel class. In this paper, we address the recurring issue, and propose a more realistic novel class detection technique, which remembers a class and identifies it as "not novel" when it reappears after a long disappearance. Our approach has shown significant reduction in classification error over state-of-the-art stream classification techniques on several benchmark data streams.
AB - Concept-evolution is one of the major challenges in data stream classification, which occurs when a new class evolves in the stream. This problem remains unaddressed by most state-of-the-art techniques. A recurring class is a special case of concept-evolution. This special case takes place when a class appears in the stream, then disappears for a long time, and again appears. Existing data stream classification techniques that address the concept-evolution problem, wrongly detect the recurring classes as novel class. This creates two main problems. First, much resource is wasted in detecting a recurring class as novel class, because novel class detection is much more computationally- and memory-intensive, as compared to simply recognizing an existing class. Second, when a novel class is identified, human experts are involved in collecting and labeling the instances of that class for future modeling. If a recurrent class is reported as novel class, it will be only a waste of human effort to find out whether it is really a novel class. In this paper, we address the recurring issue, and propose a more realistic novel class detection technique, which remembers a class and identifies it as "not novel" when it reappears after a long disappearance. Our approach has shown significant reduction in classification error over state-of-the-art stream classification techniques on several benchmark data streams.
KW - Novel class
KW - Recurring class
KW - Stream classification
UR - http://www.scopus.com/inward/record.url?scp=84857157219&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857157219&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2011.49
DO - 10.1109/ICDM.2011.49
M3 - Conference contribution
AN - SCOPUS:84857157219
SN - 9780769544083
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1176
EP - 1181
BT - Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
T2 - 11th IEEE International Conference on Data Mining, ICDM 2011
Y2 - 11 December 2011 through 14 December 2011
ER -