TY - GEN
T1 - Cloud guided stream classification using class-based ensemble
AU - Al-Khateeb, Tahseen M.
AU - Masud, Mohammad M.
AU - Khan, Latifur
AU - Thuraisingham, Bhavani
PY - 2012/10/2
Y1 - 2012/10/2
N2 - We propose a novel class-based micro-classifier ensemble classification technique (MCE) for classifying data streams. Traditional ensemble-based data stream classification techniques build a classification model from each data chunk and keep an ensemble of such models. Due to the fixed length of the ensemble, when a new model is trained, one existing model is discarded. This creates several problems. First, if a class disappears from the stream and reappears after a long time, it would be misclassified if a majority of the classifiers in the ensemble does not contain any model of that class. Second, discarding a model means discarding the corresponding data chunk completely. However, knowledge obtained from some classes might be still useful and if they are discarded, the overall error rate would increase. To address these problems, we propose an ensemble model where each class information is stored separately. From each data chunk, we train a model for each class of data. We call each such model a micro-classifier. This approach is more robust than existing chunk-based ensembles in handling dynamic changes in the data stream. To the best of our knowledge, this is the first attempt to classify data streams using the class-based ensembles approach. When the number of classes grow in the stream, class-based ensembles may degrade in performance (speed). Hence, we sketch a cloud-based solution of our class-based ensembles to handle a large number of classes effectively. We compare our technique with several state-of-the-art data stream classification techniques on both synthetic and benchmark data streams, and obtain much higher accuracy.
AB - We propose a novel class-based micro-classifier ensemble classification technique (MCE) for classifying data streams. Traditional ensemble-based data stream classification techniques build a classification model from each data chunk and keep an ensemble of such models. Due to the fixed length of the ensemble, when a new model is trained, one existing model is discarded. This creates several problems. First, if a class disappears from the stream and reappears after a long time, it would be misclassified if a majority of the classifiers in the ensemble does not contain any model of that class. Second, discarding a model means discarding the corresponding data chunk completely. However, knowledge obtained from some classes might be still useful and if they are discarded, the overall error rate would increase. To address these problems, we propose an ensemble model where each class information is stored separately. From each data chunk, we train a model for each class of data. We call each such model a micro-classifier. This approach is more robust than existing chunk-based ensembles in handling dynamic changes in the data stream. To the best of our knowledge, this is the first attempt to classify data streams using the class-based ensembles approach. When the number of classes grow in the stream, class-based ensembles may degrade in performance (speed). Hence, we sketch a cloud-based solution of our class-based ensembles to handle a large number of classes effectively. We compare our technique with several state-of-the-art data stream classification techniques on both synthetic and benchmark data streams, and obtain much higher accuracy.
KW - classification MapReduce Ensemble cloud
UR - http://www.scopus.com/inward/record.url?scp=84866747532&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866747532&partnerID=8YFLogxK
U2 - 10.1109/CLOUD.2012.127
DO - 10.1109/CLOUD.2012.127
M3 - Conference contribution
AN - SCOPUS:84866747532
SN - 9780769547558
T3 - Proceedings - 2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012
SP - 694
EP - 701
BT - Proceedings - 2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012
T2 - 2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012
Y2 - 24 June 2012 through 29 June 2012
ER -