TY - JOUR
T1 - Developing a discrimination rule between breast cancer patients and controls using proteomics mass spectrometric data
T2 - A three-step approach
AU - Heidema, A. Geert
AU - Nagelkerke, Nico
PY - 2008
Y1 - 2008
N2 - To discriminate between breast cancer patients and controls, we used a three-step approach to obtain our decision rule. First, we ranked the mass/charge values using random forests, because it generates importance indices that take possible interactions into account. We observed that the top ranked variables consisted of highly correlated contiguous mass/charge values, which were grouped in the second step into new variables. Finally, these newly created variables were used as predictors to find a suitable discrimination rule. In this last step, we compared three different methods, namely Classification and Regression Tree (CART), logistic regression and penalized logistic regression. Logistic regression and penalized logistic regression performed equally well and both had a higher classification accuracy than CART. The model obtained with penalized logistic regression was chosen as we hypothesized that this model would provide a better classification accuracy in the validation set. The solution had a good performance on the training set with a classification accuracy of 86.3%, and a sensitivity and specificity of 86.8% and 85.7%, respectively.
AB - To discriminate between breast cancer patients and controls, we used a three-step approach to obtain our decision rule. First, we ranked the mass/charge values using random forests, because it generates importance indices that take possible interactions into account. We observed that the top ranked variables consisted of highly correlated contiguous mass/charge values, which were grouped in the second step into new variables. Finally, these newly created variables were used as predictors to find a suitable discrimination rule. In this last step, we compared three different methods, namely Classification and Regression Tree (CART), logistic regression and penalized logistic regression. Logistic regression and penalized logistic regression performed equally well and both had a higher classification accuracy than CART. The model obtained with penalized logistic regression was chosen as we hypothesized that this model would provide a better classification accuracy in the validation set. The solution had a good performance on the training set with a classification accuracy of 86.3%, and a sensitivity and specificity of 86.8% and 85.7%, respectively.
UR - http://www.scopus.com/inward/record.url?scp=41649098924&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=41649098924&partnerID=8YFLogxK
U2 - 10.2202/1544-6115.1341
DO - 10.2202/1544-6115.1341
M3 - Article
C2 - 18312219
AN - SCOPUS:41649098924
SN - 1544-6115
VL - 7
JO - Statistical Applications in Genetics and Molecular Biology
JF - Statistical Applications in Genetics and Molecular Biology
IS - 2
M1 - 5
ER -