Machine Learning Approaches for Sentiment Analysis on Balanced and Unbalanced Datasets

Ahmed M. Elmassry, Abdulla Alshamsi, Ahmed F. Abdulhameed, Nazar Zaki, Abdelkader Nasreddine Belkacem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sentiment analysis, sometimes referred to as opinion mining, is essential for understanding public opinion and attitudes toward various social topics and trends. This study aims to explore the effectiveness of machine learning (ML) models, namely support vector machine (SVM), long short-term memory (LSTM), and bidirectional encoder representations from transformers (BERT), in analyzing a dataset obtained from Kaggle, which contains 37,000 user reviews on the Instagram Threads app. After initial data cleaning and preprocessing, the dataset was partitioned into 70% for training and 30% for testing. Subsequently, the training set was used to create three datasets: a balanced dataset and two unbalanced datasets, one featuring 90% positive instances and the other featuring 90% negative instances. Subsequently, these datasets were used to train the three machine learning models mentioned above, resulting in nine different models. Evaluation metrics, including accuracy, precision, recall, and F1 score, were applied to assess model performance. The finetuned BERT model on the balanced dataset outperformed all the other models with an accuracy of 86%, precision of 85%, recall of 87%, and F1-score of 86%. Furthermore, these findings underscore the effectiveness of diverse ML techniques, particularly transformers, and the crucial role of data balancing in optimizing sentiment analysis tasks.

Original languageEnglish
Title of host publication14th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages18-23
Number of pages6
ISBN (Electronic)9798350364507
DOIs
Publication statusPublished - 2024
Event14th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2024 - Penang, Malaysia
Duration: Aug 23 2024Aug 24 2024

Publication series

Name14th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2024 - Proceedings

Conference

Conference14th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2024
Country/TerritoryMalaysia
CityPenang
Period8/23/248/24/24

Keywords

  • Balanced
  • BERT
  • LSTM
  • Machine Learning
  • Sentiment Analysis
  • SVM
  • Unbalanced Dataset

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Computational Mathematics
  • Health Informatics
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Machine Learning Approaches for Sentiment Analysis on Balanced and Unbalanced Datasets'. Together they form a unique fingerprint.

Cite this