TY - JOUR
T1 - Using machine learning to predict ovarian cancer
AU - Lu, Mingyang
AU - Fan, Zhenjiang
AU - Xu, Bin
AU - Chen, Lujun
AU - Zheng, Xiao
AU - Li, Jundong
AU - Znati, Taieb
AU - Mi, Qi
AU - Jiang, Jingting
N1 - Funding Information:
This work was supported in part by grants from the National Key R&D Plan (2018YFC1313400) to J. J., the National Natural Science Foundation of China (31570877, 31729001, 81972869) to J. J., the National Key Technology R&D Program (2015BAI12B12) to J. J., Jiangsu Engineering Research Center for Tumor Immunotherapy (BM2014404) to J. J., the Key R&D Project of Science and Technology Department of Jiangsu Province (BE2018645) to J. J., the National Natural Science Foundation of China (81902386) to X. Z., Young Medical Talents Program of Jiangsu Province (QNRC2016286) to X. Z., Changzhou Science and Technology Project (Applied Based Research, CJ20190094) to X.Z., and Changzhou High-Level Medical Talents Training Project (2016CZBJ001) to L.C.
Funding Information:
This work was supported in part by grants from the National Key R&D Plan (2018YFC1313400) to J. J. the National Natural Science Foundation of China (31570877, 31729001, 81972869) to J. J. the National Key Technology R&D Program (2015BAI12B12) to J. J. Jiangsu Engineering Research Center for Tumor Immunotherapy (BM2014404) to J. J. the Key R&D Project of Science and Technology Department of Jiangsu Province (BE2018645) to J. J. the National Natural Science Foundation of China (81902386) to X. Z. Young Medical Talents Program of Jiangsu Province (QNRC2016286) to X. Z. Changzhou Science and Technology Project (Applied Based Research, CJ20190094) to X.Z. and Changzhou High-Level Medical Talents Training Project (2016CZBJ001) to L.C.
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/9
Y1 - 2020/9
N2 - Objective: Ovarian cancer (OC) is one of the most common types of cancer in women. Accurately prediction of benign ovarian tumors (BOT) and OC has important practical value. Methods: Our dataset consists of 349 Chinese patients with 49 variables including demographics, blood routine test, general chemistry, and tumor markers. Machine learning Minimum Redundancy – Maximum Relevance (MRMR) feature selection method was applied on the 235 patients’ data (89 BOT and 146 OC) to select the most relevant features, with which a simple decision tree model was constructed. The model was tested on the rest of 114 patients (89 BOT and 25 OC). The results were compared with the predictions produced by using the risk of ovarian malignancy algorithm (ROMA) and logistic regression model. Results: Ten notable features were selected by MRMR, among which two were identified as the top features by the decision tree model: human epididymis protein 4 (HE4) and carcinoembryonic antigen (CEA). Particularly, CEA is a valuable marker for OC prediction in patients with low HE4. The model also yields better prediction result than ROMA. Conclusion: Machine learning approaches were able to accurately classify BOT and OC. Our goal is to derive a simple predictive model which also carries a good performance. Using our approach, we obtained a model that consists of just two biomarkers, HE4 and CEA. The model is simple to interpret and outperforms the existing OC prediction methods. It demonstrates that the machine learning approach has good potential in predictive modeling for the complex diseases.
AB - Objective: Ovarian cancer (OC) is one of the most common types of cancer in women. Accurately prediction of benign ovarian tumors (BOT) and OC has important practical value. Methods: Our dataset consists of 349 Chinese patients with 49 variables including demographics, blood routine test, general chemistry, and tumor markers. Machine learning Minimum Redundancy – Maximum Relevance (MRMR) feature selection method was applied on the 235 patients’ data (89 BOT and 146 OC) to select the most relevant features, with which a simple decision tree model was constructed. The model was tested on the rest of 114 patients (89 BOT and 25 OC). The results were compared with the predictions produced by using the risk of ovarian malignancy algorithm (ROMA) and logistic regression model. Results: Ten notable features were selected by MRMR, among which two were identified as the top features by the decision tree model: human epididymis protein 4 (HE4) and carcinoembryonic antigen (CEA). Particularly, CEA is a valuable marker for OC prediction in patients with low HE4. The model also yields better prediction result than ROMA. Conclusion: Machine learning approaches were able to accurately classify BOT and OC. Our goal is to derive a simple predictive model which also carries a good performance. Using our approach, we obtained a model that consists of just two biomarkers, HE4 and CEA. The model is simple to interpret and outperforms the existing OC prediction methods. It demonstrates that the machine learning approach has good potential in predictive modeling for the complex diseases.
KW - Machine Learning
KW - Ovarian Cancer
KW - Tumor Marker
UR - http://www.scopus.com/inward/record.url?scp=85085565402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085565402&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2020.104195
DO - 10.1016/j.ijmedinf.2020.104195
M3 - Article
C2 - 32485554
AN - SCOPUS:85085565402
SN - 1386-5056
VL - 141
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 104195
ER -