An Enhanced Tree Ensemble for Classification in the Presence of Extreme Class Imbalance

Samir K. Safi, Sheema Gul

Research output: Contribution to journalArticlepeer-review

Abstract

Researchers using machine learning methods for classification can face challenges due to class imbalance, where a certain class is underrepresented. Over or under-sampling of minority or majority class observations, or solely relying on model selection for ensemble methods, may prove ineffective when the class imbalance ratio is extremely high. To address this issue, this paper proposes a method called enhance tree ensemble (ETE), based on generating synthetic data for minority class observations in conjunction with tree selection based on their performance on the training data. The proposed method first generates minority class instances to balance the training data and then uses the idea of tree selection by leveraging out-of-bag ((Formula presented.)) and sub-samples ((Formula presented.)) observations, respectively. The efficacy of the proposed method is assessed using twenty benchmark problems for binary classification with moderate to extreme class imbalance, comparing it against other well-known methods such as optimal tree ensemble (OTE), SMOTE random forest ((Formula presented.)), oversampling random forest ((Formula presented.)), under-sampling random forest ((Formula presented.)), k-nearest neighbor (k-NN), support vector machine (SVM), tree, and artificial neural network (ANN). Performance metrics such as classification error rate and precision are used for evaluation purposes. The analyses of the study revealed that the proposed method, based on data balancing and model selection, yielded better results than the other methods.

Original languageEnglish
Article number3243
JournalMathematics
Volume12
Issue number20
DOIs
Publication statusPublished - Oct 2024
Externally publishedYes

Keywords

  • class-imbalance problem
  • classification
  • random forest
  • synthetic data generation
  • tree selection

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Mathematics
  • Engineering (miscellaneous)

Fingerprint

Dive into the research topics of 'An Enhanced Tree Ensemble for Classification in the Presence of Extreme Class Imbalance'. Together they form a unique fingerprint.

Cite this