TY - GEN
T1 - Ensemble of subset of k-nearest neighbours models for class membership probability estimation
AU - Gul, Asma
AU - Khan, Zardad
AU - Perperoglou, Aris
AU - Mahmoud, Osama
AU - Miftahuddin, Miftahuddin
AU - Adler, Werner
AU - Lausen, Berthold
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of noninformative features in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership probability estimation in the presence of non-informative features in the data. This is done in two steps. Firstly, we select classifiers based upon their individual performance from a set of base kNN models, each generated on a bootstrap sample using a random feature set from the feature space of training data. Secondly, a step wise selection is used on the selected learners, and those models are added to the ensemble that maximize its predictive performance. We use bench mark data sets with some added non-informative features for the evaluation of our method. Experimental comparison of the proposed method with usual kNN, bagged kNN, random kNN and random forest shows that it leads to high predictive performance in terms of minimum Brier score on most of the data sets. The results are also verified by simulation studies.
AB - Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of noninformative features in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership probability estimation in the presence of non-informative features in the data. This is done in two steps. Firstly, we select classifiers based upon their individual performance from a set of base kNN models, each generated on a bootstrap sample using a random feature set from the feature space of training data. Secondly, a step wise selection is used on the selected learners, and those models are added to the ensemble that maximize its predictive performance. We use bench mark data sets with some added non-informative features for the evaluation of our method. Experimental comparison of the proposed method with usual kNN, bagged kNN, random kNN and random forest shows that it leads to high predictive performance in terms of minimum Brier score on most of the data sets. The results are also verified by simulation studies.
UR - http://www.scopus.com/inward/record.url?scp=84981549519&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84981549519&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-25226-1_35
DO - 10.1007/978-3-319-25226-1_35
M3 - Conference contribution
AN - SCOPUS:84981549519
SN - 9783319252247
T3 - Studies in Classification, Data Analysis, and Knowledge Organization
SP - 411
EP - 421
BT - Analysis of Large and Complex Data
A2 - Wilhelm, Adalbert F.X.
A2 - Kestler, Hans A.
PB - Kluwer Academic Publishers
T2 - 2nd European Conference on Data Analysis, ECDA 2014
Y2 - 2 July 2014 through 4 July 2014
ER -