TY - JOUR
T1 - Feature selection and classification for gene expression data using novel correlation based overlapping score method via Chou's 5-steps rule
AU - Wahid, Abdul
AU - Khan, Dost Muhammad
AU - Iqbal, Nadeem
AU - Khan, Sajjad Ahmad
AU - Ali, Amjad
AU - Khan, Mukhtaj
AU - Khan, Zardad
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/4/15
Y1 - 2020/4/15
N2 - The analysis of omics data together with knowledge-based interpretation can help obtaining important information regarding different biological processes and to reflect the current physiological status of tissue and cells. The main challenge, however, is to analyze high-dimensional gene expression data consisting of a massive amount of redundant genes in extracting disease-related information. To address this problem, gene selection, that eliminates redundant and irrelevant genes, has been a key step. In current article, a feature selection technique is proposed that exploit correlation based overlapping analysis of expression data across classes. The proposed correlation based overlapping score (COS) technique is compared with state-of-the-art gene selection approaches using real-world benchmark microarray datasets. In an experimental evaluation, the COS algorithm outperforms the other methods with minimum misclassification errors obtained via boosting, random forest and k-nearest neighbour (kNN) classifiers. Moreover, the proposed technique is more stable than the other techniques in gene selection.
AB - The analysis of omics data together with knowledge-based interpretation can help obtaining important information regarding different biological processes and to reflect the current physiological status of tissue and cells. The main challenge, however, is to analyze high-dimensional gene expression data consisting of a massive amount of redundant genes in extracting disease-related information. To address this problem, gene selection, that eliminates redundant and irrelevant genes, has been a key step. In current article, a feature selection technique is proposed that exploit correlation based overlapping analysis of expression data across classes. The proposed correlation based overlapping score (COS) technique is compared with state-of-the-art gene selection approaches using real-world benchmark microarray datasets. In an experimental evaluation, the COS algorithm outperforms the other methods with minimum misclassification errors obtained via boosting, random forest and k-nearest neighbour (kNN) classifiers. Moreover, the proposed technique is more stable than the other techniques in gene selection.
KW - Classifiers
KW - Correlation based overlapping score
KW - Feature selection
KW - Gene expression data
KW - Stability index
UR - http://www.scopus.com/inward/record.url?scp=85078848393&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078848393&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2020.103958
DO - 10.1016/j.chemolab.2020.103958
M3 - Article
AN - SCOPUS:85078848393
SN - 0169-7439
VL - 199
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
M1 - 103958
ER -