TY - GEN
T1 - Lacking labels in the stream
T2 - 18th International Symposium on Methodologies for Intelligent Systems, ISMIS 2009
AU - Woolam, Clay
AU - Masud, Mohammad M.
AU - Khan, Latifur
PY - 2009
Y1 - 2009
N2 - This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.
AB - This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model would be built in these scenarios, considering that the data stream also has evolving nature? In our previous work we applied semi-supervised clustering to build classification models using limited amount of labeled training data. However, it assumed that the data to be labeled should be chosen randomly. In our current work, we relax this assumption, and propose a label propagation framework for data streams that can build good classification models even if the data are not labeled randomly. Comparison with state-of-the-art stream classification techniques on synthetic and benchmark real data proves the effectiveness of our approach.
UR - http://www.scopus.com/inward/record.url?scp=70349871421&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349871421&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04125-9_58
DO - 10.1007/978-3-642-04125-9_58
M3 - Conference contribution
AN - SCOPUS:70349871421
SN - 3642041248
SN - 9783642041242
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 552
EP - 562
BT - Foundations of Intelligent Systems - 18th International Symposium, ISMIS 2009, Proceedings
Y2 - 14 September 2009 through 17 September 2009
ER -