Random ordinality ensembles: A novel ensemble method for multi-valued categorical data

Amir Ahmad, Gavin Brown

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.

Original languageEnglish
Title of host publicationMultiple Classifier Systems - 8th International Workshop, MCS 2009, Proceedings
Pages222-231
Number of pages10
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event8th International Workshop on Multiple Classifier Systems, MCS 2009 - Reykjavik, Iceland
Duration: Jun 10 2009Jun 12 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5519 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Workshop on Multiple Classifier Systems, MCS 2009
Country/TerritoryIceland
CityReykjavik
Period6/10/096/12/09

Keywords

  • Binary splits
  • Data fragmentation
  • Decision trees
  • Multi-way splits
  • Random Ordinality

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Random ordinality ensembles: A novel ensemble method for multi-valued categorical data'. Together they form a unique fingerprint.

Cite this