TY - GEN
T1 - A multi-partition multi-chunk ensemble technique to classify concept-drifting data streams
AU - Masud, Mohammad M.
AU - Gao, Jing
AU - Khan, Latifur
AU - Han, Jiawei
AU - Thuraisingham, Bhavani
PY - 2009
Y1 - 2009
N2 - We propose a multi-partition, multi-chunk ensemble classifier based datamining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multipartition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches.We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.
AB - We propose a multi-partition, multi-chunk ensemble classifier based datamining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multipartition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches.We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.
UR - http://www.scopus.com/inward/record.url?scp=67650668430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650668430&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-01307-2_34
DO - 10.1007/978-3-642-01307-2_34
M3 - Conference contribution
AN - SCOPUS:67650668430
SN - 3642013066
SN - 9783642013065
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 363
EP - 375
BT - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
T2 - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Y2 - 27 April 2009 through 30 April 2009
ER -