TY - GEN
T1 - Random forest for gene selection and microarray data classification
AU - Moorthy, Kohbalan
AU - Mohamad, Mohd Saberi
PY - 2012
Y1 - 2012
N2 - A random forest method has been selected to perform both gene selection and classification of the microarray data. The goal of this research is to develop and improve the random forest gene selection method. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. In this research, ten datasets that consists of different classes are used, which are Adenocarcinoma, Brain, Breast (Class 2 and 3), Colon, Leukemia, Lymphoma, NCI60, Prostate and Small Round Blue-Cell Tumor (SRBCT). Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.
AB - A random forest method has been selected to perform both gene selection and classification of the microarray data. The goal of this research is to develop and improve the random forest gene selection method. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. In this research, ten datasets that consists of different classes are used, which are Adenocarcinoma, Brain, Breast (Class 2 and 3), Colon, Leukemia, Lymphoma, NCI60, Prostate and Small Round Blue-Cell Tumor (SRBCT). Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.
KW - cancer classification
KW - classification
KW - gene expression data
KW - gene selection
KW - microarray data
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=84865598995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865598995&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-32826-8_18
DO - 10.1007/978-3-642-32826-8_18
M3 - Conference contribution
AN - SCOPUS:84865598995
SN - 9783642328251
T3 - Communications in Computer and Information Science
SP - 174
EP - 183
BT - Knowledge Technology - Third Knowledge Technology Week, KTW 2011, Revised Selected Papers
T2 - 3rd Knowledge Technology Week, KTW 2011
Y2 - 18 July 2011 through 22 July 2011
ER -