A multi-partition multi-chunk ensemble technique to classify concept-drifting data streams

Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, Bhavani Thuraisingham

Research output: Chapter in Book/Report/Conference proceedingConference contribution

36 Citations (Scopus)

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based datamining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multipartition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches.We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.

Original languageEnglish
Title of host publication13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Pages363-375
Number of pages13
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 - Bangkok, Thailand
Duration: Apr 27 2009Apr 30 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5476 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Country/TerritoryThailand
CityBangkok
Period4/27/094/30/09

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A multi-partition multi-chunk ensemble technique to classify concept-drifting data streams'. Together they form a unique fingerprint.

Cite this