TY - GEN
T1 - Enhancing Diagnostic Accuracy by Bypassing Traditional Imputation and Leveraging Missing Data in Alzheimer’s Disease Detection Models
AU - Dabool, Hamzah
AU - Alashwal, Hany
AU - Moustafa, Ahmad A.
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/11/25
Y1 - 2024/11/25
N2 - Researchers often encounter significant hurdles when dealing with datasets that contain a vast number of missing values. This predicament forces them to make a tough choice: either discard a substantial amount of data, which could drastically undermine the accuracy of the machine learning (ML) model, or attempt to fill these missing values in sensitive medical datasets—a method that is far from ideal. This paper proposes an approach to this issue, suggesting that bypassing the traditional path of data imputation in favor of a model that learns from the missing values themselves could paradoxically improve the accuracy and predictive capabilities of Alzheimer’s Disease (AD) identification models. We introduce a comparison between state-of-the-art ML models and the XGBoost algorithm, which is designed to integrate the learning of missing values into its training cycle, using the official ADNI datasets with extensive missing values. The experiment further evaluates these models on the same datasets post-imputation. The results strikingly indicate that this unconventional strategy not only bridges the gaps created by missing data but also surpasses the accuracy of traditional methods that rely on filling in incomplete samples. This discovery opens up new avenues for research in medical diagnostics for conditions like AD, where data scarcity and imperfections are common. By rethinking how we handle incomplete data, we unlock new potential for refining ML applications in healthcare, particularly in enhancing the precision of diagnoses in complex diseases such as AD.
AB - Researchers often encounter significant hurdles when dealing with datasets that contain a vast number of missing values. This predicament forces them to make a tough choice: either discard a substantial amount of data, which could drastically undermine the accuracy of the machine learning (ML) model, or attempt to fill these missing values in sensitive medical datasets—a method that is far from ideal. This paper proposes an approach to this issue, suggesting that bypassing the traditional path of data imputation in favor of a model that learns from the missing values themselves could paradoxically improve the accuracy and predictive capabilities of Alzheimer’s Disease (AD) identification models. We introduce a comparison between state-of-the-art ML models and the XGBoost algorithm, which is designed to integrate the learning of missing values into its training cycle, using the official ADNI datasets with extensive missing values. The experiment further evaluates these models on the same datasets post-imputation. The results strikingly indicate that this unconventional strategy not only bridges the gaps created by missing data but also surpasses the accuracy of traditional methods that rely on filling in incomplete samples. This discovery opens up new avenues for research in medical diagnostics for conditions like AD, where data scarcity and imperfections are common. By rethinking how we handle incomplete data, we unlock new potential for refining ML applications in healthcare, particularly in enhancing the precision of diagnoses in complex diseases such as AD.
KW - ADNI
KW - Alzheimer’s Disease (AD)
KW - Data Imputation
KW - Extreme Gradient Boosting (XGBoost)
KW - Machine Learning
KW - Medical Diagnostics
KW - Missing Values
UR - http://www.scopus.com/inward/record.url?scp=105005934281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105005934281&partnerID=8YFLogxK
U2 - 10.1145/3686397.3686403
DO - 10.1145/3686397.3686403
M3 - Conference contribution
AN - SCOPUS:105005934281
T3 - ACM International Conference Proceeding Series
SP - 33
EP - 38
BT - ICISDM 2024 - 8th International Conference on Information System and Data Mining
PB - Association for Computing Machinery
T2 - 8th International Conference on Information System and Data Mining, ICISDM 2024
Y2 - 24 June 2024 through 26 June 2024
ER -