TY - JOUR
T1 - Regulatory genes identification within functional genomics experiments for tissue classification into binary classes via machine learning techniques
AU - Wazir, Bushra
AU - Khan, Dost Muhammad
AU - Khalil, Umair
AU - Hamraz, Muhammad
AU - Gul, Naz
AU - Khan, Zardad
N1 - Publisher Copyright:
© 2020 Pakistan Medical Association. All rights reserved.
PY - 2020/12
Y1 - 2020/12
N2 - Objective: The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes. Methods: This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukaemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures. Results: The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods. Conclusion: The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes.
AB - Objective: The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes. Methods: This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukaemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures. Results: The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods. Conclusion: The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes.
KW - Cancer
KW - Classification
KW - Gene selection
KW - Microarray gene expression
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=85100280197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100280197&partnerID=8YFLogxK
U2 - 10.47391/JPMA.201
DO - 10.47391/JPMA.201
M3 - Article
C2 - 33475543
AN - SCOPUS:85100280197
SN - 0030-9982
VL - 70
SP - 2356
EP - 2362
JO - Journal of the Pakistan Medical Association
JF - Journal of the Pakistan Medical Association
IS - 12 B
ER -