SDGnE: A Synthetic Data Generation and Evaluation System for Rare Event Prediction

Wan D. Bae, Shayma Alkobaisi, Sartaj Bhuvaji, Siddheshwari Bankar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Class imbalance in datasets creates a significant challenge for building efficient classifiers and results in poor prediction of rare events. This problem is more critical in applications where the size of the dataset is often small such as individual-based health risk prediction modeling and engineering problems heavily based on simulations. While several techniques have been proposed in this field, their performance with small size datasets requires improvement for practical use of the machine learning algorithms. This paper presents a system framework called “Synthetic Data Generation and Evaluation (SDGnE)” for the class imbalance problem by generating synthetic data using various techniques, analyzing data quality, and comparing the performance of the implemented techniques. We demonstrate the proposed system using a web-based user interface that includes methods for data generation, statistical analysis, and visual evaluation. The proposed system can help users have better understanding and insight of the generated data when using different techniques and can be straightforwardly extended to include new data generation techniques and evaluation tools.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 29th International Conference, DASFAA 2024, Proceedings
EditorsMakoto Onizuka, Jae-Gil Lee, Yongxin Tong, Chuan Xiao, Yoshiharu Ishikawa, Kejing Lu, Sihem Amer-Yahia, H.V. Jagadish
PublisherSpringer Science and Business Media Deutschland GmbH
Pages508-512
Number of pages5
ISBN (Print)9789819755745
DOIs
Publication statusPublished - 2024
Event29th International Conference on Database Systems for Advanced Applications, DASFAA 2024 - Gifu, Japan
Duration: Jul 2 2024Jul 5 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14856 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference29th International Conference on Database Systems for Advanced Applications, DASFAA 2024
Country/TerritoryJapan
CityGifu
Period7/2/247/5/24

Keywords

  • autoencoder
  • class imbalance
  • classification
  • generative adversarial network
  • SMOTE
  • synthetic data generation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'SDGnE: A Synthetic Data Generation and Evaluation System for Rare Event Prediction'. Together they form a unique fingerprint.

Cite this