TY - GEN
T1 - Random ordinality ensembles
T2 - 8th International Workshop on Multiple Classifier Systems, MCS 2009
AU - Ahmad, Amir
AU - Brown, Gavin
PY - 2009
Y1 - 2009
N2 - Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.
AB - Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.
KW - Binary splits
KW - Data fragmentation
KW - Decision trees
KW - Multi-way splits
KW - Random Ordinality
UR - http://www.scopus.com/inward/record.url?scp=70349306580&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349306580&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02326-2_23
DO - 10.1007/978-3-642-02326-2_23
M3 - Conference contribution
AN - SCOPUS:70349306580
SN - 3642023258
SN - 9783642023255
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 222
EP - 231
BT - Multiple Classifier Systems - 8th International Workshop, MCS 2009, Proceedings
Y2 - 10 June 2009 through 12 June 2009
ER -