TY - GEN
T1 - Classification and novel class detection in data streams with active mining
AU - Masud, Mohammad M.
AU - Gao, Jing
AU - Khan, Latifur
AU - Han, Jiawei
AU - Thuraisingham, Bhavani
PY - 2010
Y1 - 2010
N2 - We present ActMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and limited labeled data. Most of the existing data stream classification techniques address only the infinite length and concept-drift problems. Our previous work, MineClass, addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Concept-evolution occurs in the stream when novel classes arrive. However, most of the existing data stream classification techniques, including MineClass, require that all the instances in a data stream be labeled by human experts and become available for training. This assumption is impractical, since data labeling is both time consuming and costly. Therefore, it is impossible to label a majority of the data points in a high-speed data stream. This scarcity of labeled data naturally leads to poorly trained classifiers. ActMiner actively selects only those data points for labeling for which the expected classification error is high. Therefore, ActMiner extends MineClass, and addresses the limited labeled data problem in addition to addressing the other three problems. It outperforms the state-of-the-art data stream classification techniques that use ten times or more labeled data than ActMiner.
AB - We present ActMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and limited labeled data. Most of the existing data stream classification techniques address only the infinite length and concept-drift problems. Our previous work, MineClass, addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Concept-evolution occurs in the stream when novel classes arrive. However, most of the existing data stream classification techniques, including MineClass, require that all the instances in a data stream be labeled by human experts and become available for training. This assumption is impractical, since data labeling is both time consuming and costly. Therefore, it is impossible to label a majority of the data points in a high-speed data stream. This scarcity of labeled data naturally leads to poorly trained classifiers. ActMiner actively selects only those data points for labeling for which the expected classification error is high. Therefore, ActMiner extends MineClass, and addresses the limited labeled data problem in addition to addressing the other three problems. It outperforms the state-of-the-art data stream classification techniques that use ten times or more labeled data than ActMiner.
UR - http://www.scopus.com/inward/record.url?scp=79956336459&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956336459&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-13672-6_31
DO - 10.1007/978-3-642-13672-6_31
M3 - Conference contribution
AN - SCOPUS:79956336459
SN - 3642136710
SN - 9783642136719
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 311
EP - 324
BT - Advances in Knowledge Discovery and Data Mining - 14th Pacific-Asia Conference, PAKDD 2010, Proceedings
T2 - 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2010
Y2 - 21 June 2010 through 24 June 2010
ER -