TY - GEN
T1 - Clustering Medical Transcriptions Using K -Means
AU - Salloum, Said
AU - Tahat, Dina
AU - Tahat, Khalaf
AU - Alfaisal, Raghad
AU - Salloum, Ayham
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The clustering of medical transcriptions is an essential task for the categorization and summarization of large volumes of medical records. This paper explores the efficacy of k-means clustering, a well-known unsupervised machine learning algorithm, to discern patterns and segregate medical transcriptions into distinct clusters. We processed a dataset comprising various medical reports, systematically cleaning and preparing the text for analysis. By employing a Term Frequency-Inverse Document Frequency (TF-IDF) approach, we converted the textual data into a vectorized format amenable to machine learning methods. Subsequent dimensionality reduction through Principal Component Analysis (PCA) facilitated the visualization and interpretation of the high-dimensional data in two-dimensional space. The k-means algorithm was then applied, revealing five distinct clusters. Each cluster was characterized by examining the prevalence of key terms, uncovering thematic consistencies that may correspond to particular medical procedures or specialties. The resulting clusters demonstrate the algorithm's potential to automatically categorize medical documentation in a way that mirrors clinical relevance, thereby providing a foundation for improved information management systems in healthcare settings.
AB - The clustering of medical transcriptions is an essential task for the categorization and summarization of large volumes of medical records. This paper explores the efficacy of k-means clustering, a well-known unsupervised machine learning algorithm, to discern patterns and segregate medical transcriptions into distinct clusters. We processed a dataset comprising various medical reports, systematically cleaning and preparing the text for analysis. By employing a Term Frequency-Inverse Document Frequency (TF-IDF) approach, we converted the textual data into a vectorized format amenable to machine learning methods. Subsequent dimensionality reduction through Principal Component Analysis (PCA) facilitated the visualization and interpretation of the high-dimensional data in two-dimensional space. The k-means algorithm was then applied, revealing five distinct clusters. Each cluster was characterized by examining the prevalence of key terms, uncovering thematic consistencies that may correspond to particular medical procedures or specialties. The resulting clusters demonstrate the algorithm's potential to automatically categorize medical documentation in a way that mirrors clinical relevance, thereby providing a foundation for improved information management systems in healthcare settings.
KW - Dimensionality Reduction
KW - K-Means Clustering
KW - Medical Transcriptions
KW - Unsupervised Learning
UR - http://www.scopus.com/inward/record.url?scp=85215320564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215320564&partnerID=8YFLogxK
U2 - 10.1109/ICCNS62192.2024.10776237
DO - 10.1109/ICCNS62192.2024.10776237
M3 - Conference contribution
AN - SCOPUS:85215320564
T3 - 2024 International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024
SP - 291
EP - 294
BT - 2024 International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024
A2 - Jararweh, Yaser
A2 - Alsmirat, Mohammad
A2 - Aloqaily, Moayad
A2 - Salameh, Haythem Bany
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024
Y2 - 24 September 2024 through 27 September 2024
ER -