TY - JOUR
T1 - Robust Proportional Overlapping Analysis for Feature Selection in Binary Classification Within Functional Genomic Experiments
AU - Hamraz, Muhammad
AU - Gul, Naz
AU - Raza, Mushtaq
AU - Khan, Dost Muhammad
AU - Khalil, Umair
AU - Zubair, Seema
AU - Khan, Zardad
N1 - Publisher Copyright:
Copyright 2021 Hamraz et al.
PY - 2021
Y1 - 2021
N2 - In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
AB - In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
KW - Binary classification
KW - Feature selection
KW - Functional genomic
KW - Overlapping analysis
UR - http://www.scopus.com/inward/record.url?scp=85108551907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108551907&partnerID=8YFLogxK
U2 - 10.7717/PEERJ-CS.562
DO - 10.7717/PEERJ-CS.562
M3 - Article
AN - SCOPUS:85108551907
SN - 2376-5992
VL - 7
SP - 1
EP - 22
JO - PeerJ Computer Science
JF - PeerJ Computer Science
ER -