TY - JOUR
T1 - Hybrid feature selection approach to identify optimal features of profile metadata to detect social bots in Twitter
AU - Alothali, Eiman
AU - Hayawi, Kadhim
AU - Alashwal, Hany
N1 - Funding Information:
This work was partially supported by Zayed University, UAE, under the RIF research grant number R20132.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - The last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naïve Bayes, support vector machines, and neural networks. We found that the cross-validation attribute evaluation performance was the best when compared to other FS methods. Our results show that the random forest classifier with six optimal features achieved the best score of 94.3% for the area under the curve. The results maintained overall 89% accuracy, 83.8% precision, and 83.3% recall for the bot class. We found that using four features: favorites_count, verified, statuses_count, and average_tweets_per_day, achieves good performance metrics for bot detection (84.1% precision, 81.2% recall).
AB - The last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naïve Bayes, support vector machines, and neural networks. We found that the cross-validation attribute evaluation performance was the best when compared to other FS methods. Our results show that the random forest classifier with six optimal features achieved the best score of 94.3% for the area under the curve. The results maintained overall 89% accuracy, 83.8% precision, and 83.3% recall for the bot class. We found that using four features: favorites_count, verified, statuses_count, and average_tweets_per_day, achieves good performance metrics for bot detection (84.1% precision, 81.2% recall).
KW - Bot detection
KW - Feature selection
KW - Supervised learning
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85115161312&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115161312&partnerID=8YFLogxK
U2 - 10.1007/s13278-021-00786-4
DO - 10.1007/s13278-021-00786-4
M3 - Article
AN - SCOPUS:85115161312
SN - 1869-5450
VL - 11
JO - Social Network Analysis and Mining
JF - Social Network Analysis and Mining
IS - 1
M1 - 84
ER -