TY - GEN
T1 - Minimizing redundancy among genes selected based on the overlapping analysis
AU - Mahmoud, Osama
AU - Harrison, Andrew
AU - Gul, Asma
AU - Khan, Zardad
AU - Metodiev, Metodi V.
AU - Lausen, Berthold
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy.We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; SupportVector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.
AB - For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on a set of discriminative genes. Analyzing overlapping between gene expression of different classes is an effective criterion for identifying relevant genes. However, genes selected according to maximizing a relevance score could have rich redundancy.We propose a scheme for minimizing selection redundancy, in which the Proportional Overlapping Score (POS) technique is extended by using a recursive approach to assign a set of complementary discriminative genes. The proposed scheme exploits the gene masks defined by POS to identify more integrated genes in terms of their classification patterns. The approach is validated by comparing its classification performance with other feature selection methods, Wilcoxon Rank Sum, mRMR, MaskedPainter and POS, for several benchmark gene expression datasets using three different classifiers: Random Forest; k Nearest Neighbour; SupportVector Machine. The experimental results of classification error rates show that our proposal achieves a better performance.
UR - http://www.scopus.com/inward/record.url?scp=84981537797&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84981537797&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-25226-1_24
DO - 10.1007/978-3-319-25226-1_24
M3 - Conference contribution
AN - SCOPUS:84981537797
SN - 9783319252247
T3 - Studies in Classification, Data Analysis, and Knowledge Organization
SP - 275
EP - 285
BT - Analysis of Large and Complex Data
A2 - Wilhelm, Adalbert F.X.
A2 - Kestler, Hans A.
PB - Kluwer Academic Publishers
T2 - 2nd European Conference on Data Analysis, ECDA 2014
Y2 - 2 July 2014 through 4 July 2014
ER -