TY - GEN
T1 - Integrating novel class detection with classification for concept-drifting data streams
AU - Masud, Mohammad M.
AU - Gao, Jing
AU - Khan, Latifur
AU - Han, Jiawei
AU - Thuraisingham, Bhavani
PY - 2009
Y1 - 2009
N2 - In a typical data stream classification task, it is assumed that the total number of classes are fixed. This assumption may not be valid in a real streaming environment, where new classes may evolve. Traditional data stream classification techniques are not capable of recognizing novel class instances until the appearance of the novel class is manually identified, and labeled instances of that class are presented to the learning algorithm for training. The problem becomes more challenging in the presence of concept-drift, when the underlying data distribution changes over time. We propose a novel and efficient technique that can automatically detect the emergence of a novel class in the presence of concept-drift by quantifying cohesion among unlabeled test instances, and separation of the test instances from training instances. Our approach is non-parametric, meaning, it does not assume any underlying distributions of data. Comparison with the state-of-the-art stream classification techniques prove the superiority of our approach.
AB - In a typical data stream classification task, it is assumed that the total number of classes are fixed. This assumption may not be valid in a real streaming environment, where new classes may evolve. Traditional data stream classification techniques are not capable of recognizing novel class instances until the appearance of the novel class is manually identified, and labeled instances of that class are presented to the learning algorithm for training. The problem becomes more challenging in the presence of concept-drift, when the underlying data distribution changes over time. We propose a novel and efficient technique that can automatically detect the emergence of a novel class in the presence of concept-drift by quantifying cohesion among unlabeled test instances, and separation of the test instances from training instances. Our approach is non-parametric, meaning, it does not assume any underlying distributions of data. Comparison with the state-of-the-art stream classification techniques prove the superiority of our approach.
UR - http://www.scopus.com/inward/record.url?scp=70349952316&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349952316&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04174-7_6
DO - 10.1007/978-3-642-04174-7_6
M3 - Conference contribution
AN - SCOPUS:70349952316
SN - 3642041736
SN - 9783642041730
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 79
EP - 94
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2009, Proceedings
PB - Springer Verlag
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009
Y2 - 7 September 2009 through 11 September 2009
ER -