TY - GEN
T1 - Using Self-labeling and Co-Training to Enhance Bots Labeling in Twitter
AU - Alothali, Eiman
AU - Hayawi, Kadhim
AU - Alashwal, Hany
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The rapid evolution in social bots have required efficient solutions to detect them in real-time. In fact, obtaining labeled stream datasets that contains variety of bots is essential for this classification task. Despite that, it is one of the challenging issues for this problem. Accordingly, finding appropriate techniques to label unlabeled data is vital to enhance bot detection. In this paper, we investigate two labeling techniques for semi-supervised learning to evaluate different performances for bot detection. We examine self-training and co-training. Our results show that self-training with maximum confidence outperformed by achieving a score of 0.856 for F1 measure and 0.84 for AUC. Random Forest classifier in both techniques performed better compared to other classifiers. In co-training, using single view approach with random forest classifier using less features achieved slightly better compared to single view with more features. Using multi-view of features in co-training in general achieved similar results for different splits.
AB - The rapid evolution in social bots have required efficient solutions to detect them in real-time. In fact, obtaining labeled stream datasets that contains variety of bots is essential for this classification task. Despite that, it is one of the challenging issues for this problem. Accordingly, finding appropriate techniques to label unlabeled data is vital to enhance bot detection. In this paper, we investigate two labeling techniques for semi-supervised learning to evaluate different performances for bot detection. We examine self-training and co-training. Our results show that self-training with maximum confidence outperformed by achieving a score of 0.856 for F1 measure and 0.84 for AUC. Random Forest classifier in both techniques performed better compared to other classifiers. In co-training, using single view approach with random forest classifier using less features achieved slightly better compared to single view with more features. Using multi-view of features in co-training in general achieved similar results for different splits.
KW - co-training
KW - self-labeling
KW - Semi-supervised
UR - http://www.scopus.com/inward/record.url?scp=85147003285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147003285&partnerID=8YFLogxK
U2 - 10.1109/AICCSA56895.2022.10017585
DO - 10.1109/AICCSA56895.2022.10017585
M3 - Conference contribution
AN - SCOPUS:85147003285
T3 - Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
BT - 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications, AICCSA 2022 - Proceedings
PB - IEEE Computer Society
T2 - 19th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2022
Y2 - 5 December 2022 through 7 December 2022
ER -