Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications

Wan D. Bae, Shayma Alkobaisi, Siddheshwari Bankar, Sartaj Bhuvaji, Jay Singhvi, Madhuroopa Irukulla, William McDonnell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Prediction models for data-starved medical applications lag behind general machine learning solutions, despite their potential to improve early interventions. This is largely due to the assumption that optimization approaches are applied on a balanced distribution of events, yet medical data often has an imbalanced distribution within classes. The curse of dimensionality is further exacerbated by small samples and a high number of features in individual-based risk prediction models. In this paper, we propose a data augmentation system to gradually create synthetic minority samples with a control coefficient, which improves the quality of generated data over time and consequently boosts prediction model performance. This system incrementally adjusts to the data distribution, avoiding overfitting. We evaluate our approach using four synthetic oversampling techniques on real asthma patient data. Our results show that this system enhances classifiers’ overall performance across all four techniques. Specifically, applying the incremental data augmentation approach to three oversampling methods led to an increase in sensitivity of 4.01% to 7.79% in deep transfer learning-based classifiers.

Original languageEnglish
Title of host publicationBig Data Analytics and Knowledge Discovery - 26th International Conference, DaWaK 2024, Proceedings
EditorsRobert Wrembel, Silvia Chiusano, Gabriele Kotsis, Ismail Khalil, A Min Tjoa
PublisherSpringer Science and Business Media Deutschland GmbH
Pages112-119
Number of pages8
ISBN (Print)9783031683220
DOIs
Publication statusPublished - 2024
Event26th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2024 - Naples, Italy
Duration: Aug 26 2024Aug 28 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14912 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2024
Country/TerritoryItaly
CityNaples
Period8/26/248/28/24

Keywords

  • class imbalance problem
  • control coefficient
  • data starved contexts
  • rare event prediction
  • synthetic minority oversampling technique

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications'. Together they form a unique fingerprint.

Cite this