TY - GEN
T1 - Leveraging K-Means Clustering for Analysis of Arabic Hate Speech Tweets
AU - Salloum, Said
AU - Tahat, Khalaf
AU - Mansoori, Ahmed
AU - Alfaisal, Raghad
AU - Tahat, Dina
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - As hate speech is becoming common on social media platforms, it is important to detect, and curb hate speech in order to provide a better and safe environment online. Given the heavy usage of manual methods of hate speech detection, researchers started putting efforts in the direction of machine-learning-based automated methods sooner or later. Many available datasets and models on hate speech detection are largely inadequate for Arab hate speech because of the complexity of language and cultural nuances. In this paper, the researchers eased these difficulties, as they, used the K-Means to apply compilation for Arab hate speech in the L-HSAB dataset on the tweets. Methodology consisted of Prospecting and Pre-processing, Term Frequency Inverse Document frequency (TF-IDF) dimension reductions, through Principal Component Analysis (PCA), and assembly via K-Means. They helped to identify different sets of hate speech tweets and understand the common topics and topics. It has led to new understandings of how Arab hate speech flows on the Internet and now offers the potential for tailored interventions. Automated hate speech analysis via machine learning would allow policymakers to formulate tailored modification strategies focused on making the Internet safer and the community more harmonious.
AB - As hate speech is becoming common on social media platforms, it is important to detect, and curb hate speech in order to provide a better and safe environment online. Given the heavy usage of manual methods of hate speech detection, researchers started putting efforts in the direction of machine-learning-based automated methods sooner or later. Many available datasets and models on hate speech detection are largely inadequate for Arab hate speech because of the complexity of language and cultural nuances. In this paper, the researchers eased these difficulties, as they, used the K-Means to apply compilation for Arab hate speech in the L-HSAB dataset on the tweets. Methodology consisted of Prospecting and Pre-processing, Term Frequency Inverse Document frequency (TF-IDF) dimension reductions, through Principal Component Analysis (PCA), and assembly via K-Means. They helped to identify different sets of hate speech tweets and understand the common topics and topics. It has led to new understandings of how Arab hate speech flows on the Internet and now offers the potential for tailored interventions. Automated hate speech analysis via machine learning would allow policymakers to formulate tailored modification strategies focused on making the Internet safer and the community more harmonious.
KW - Arabic hate speech
KW - K-Means clustering
KW - machine learning
KW - principal component analysis (PCA)
KW - social media analysis
KW - term frequency-inverse document frequency (TF-IDF)
UR - http://www.scopus.com/inward/record.url?scp=105002573983&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105002573983&partnerID=8YFLogxK
U2 - 10.1109/GCET64327.2024.10934641
DO - 10.1109/GCET64327.2024.10934641
M3 - Conference contribution
AN - SCOPUS:105002573983
T3 - Global Congress on Emerging Technologies, GCET 2024
SP - 282
EP - 285
BT - Global Congress on Emerging Technologies, GCET 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Global Congress on Emerging Technologies, GCET 2024
Y2 - 9 December 2024 through 11 December 2024
ER -