Keep it simple: random oversampling for imbalanced data

Firuz Kamalov, Ho Hon Leung, Aswani Kumar Cherukuri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

The issue of imbalanced data affects a wide range of applications. Despite a plethora of sophisticated sampling techniques for dealing with imbalanced data, the simple random oversampling (ROS) method remains a robust alternative. The goal of this paper is to compare the performance of ROS to the more advanced sampling algorithms. To this end, we conduct numerical experiments on multi-label data. The results of the experiments reveal that ROS outperforms several advanced sampling algorithms. Given the computational efficiency of ROS and its robust accuracy, we believe that it provides a good option for dealing with imbalanced data.

Original languageEnglish
Title of host publication2023 Advances in Science and Engineering Technology International Conferences, ASET 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665454742
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event2023 Advances in Science and Engineering Technology International Conferences, ASET 2023 - Dubai, United Arab Emirates
Duration: Feb 20 2023Feb 23 2023

Publication series

Name2023 Advances in Science and Engineering Technology International Conferences, ASET 2023

Conference

Conference2023 Advances in Science and Engineering Technology International Conferences, ASET 2023
Country/TerritoryUnited Arab Emirates
CityDubai
Period2/20/232/23/23

Keywords

  • data mining
  • imbalanced data
  • machine learning
  • random oversampling

ASJC Scopus subject areas

  • Renewable Energy, Sustainability and the Environment
  • Biomedical Engineering
  • Control and Optimization
  • Artificial Intelligence
  • Computer Science Applications
  • Decision Sciences (miscellaneous)
  • Fuel Technology

Fingerprint

Dive into the research topics of 'Keep it simple: random oversampling for imbalanced data'. Together they form a unique fingerprint.

Cite this