Leveraging K-Means Clustering for Analysis of Arabic Hate Speech Tweets

Said Salloum, Khalaf Tahat, Ahmed Mansoori, Raghad Alfaisal, Dina Tahat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As hate speech is becoming common on social media platforms, it is important to detect, and curb hate speech in order to provide a better and safe environment online. Given the heavy usage of manual methods of hate speech detection, researchers started putting efforts in the direction of machine-learning-based automated methods sooner or later. Many available datasets and models on hate speech detection are largely inadequate for Arab hate speech because of the complexity of language and cultural nuances. In this paper, the researchers eased these difficulties, as they, used the K-Means to apply compilation for Arab hate speech in the L-HSAB dataset on the tweets. Methodology consisted of Prospecting and Pre-processing, Term Frequency Inverse Document frequency (TF-IDF) dimension reductions, through Principal Component Analysis (PCA), and assembly via K-Means. They helped to identify different sets of hate speech tweets and understand the common topics and topics. It has led to new understandings of how Arab hate speech flows on the Internet and now offers the potential for tailored interventions. Automated hate speech analysis via machine learning would allow policymakers to formulate tailored modification strategies focused on making the Internet safer and the community more harmonious.

Original languageEnglish
Title of host publicationGlobal Congress on Emerging Technologies, GCET 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages282-285
Number of pages4
ISBN (Electronic)9798331542603
DOIs
Publication statusPublished - 2024
Event2024 Global Congress on Emerging Technologies, GCET 2024 - Gran Canaria, Spain
Duration: Dec 9 2024Dec 11 2024

Publication series

NameGlobal Congress on Emerging Technologies, GCET 2024

Conference

Conference2024 Global Congress on Emerging Technologies, GCET 2024
Country/TerritorySpain
CityGran Canaria
Period12/9/2412/11/24

Keywords

  • Arabic hate speech
  • K-Means clustering
  • machine learning
  • principal component analysis (PCA)
  • social media analysis
  • term frequency-inverse document frequency (TF-IDF)

ASJC Scopus subject areas

  • Strategy and Management
  • Artificial Intelligence
  • Computer Science Applications
  • Human-Computer Interaction
  • Information Systems and Management
  • Control and Optimization
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Leveraging K-Means Clustering for Analysis of Arabic Hate Speech Tweets'. Together they form a unique fingerprint.

Cite this