Clustering Medical Transcriptions Using K -Means

Said Salloum, Dina Tahat, Khalaf Tahat, Raghad Alfaisal, Ayham Salloum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The clustering of medical transcriptions is an essential task for the categorization and summarization of large volumes of medical records. This paper explores the efficacy of k-means clustering, a well-known unsupervised machine learning algorithm, to discern patterns and segregate medical transcriptions into distinct clusters. We processed a dataset comprising various medical reports, systematically cleaning and preparing the text for analysis. By employing a Term Frequency-Inverse Document Frequency (TF-IDF) approach, we converted the textual data into a vectorized format amenable to machine learning methods. Subsequent dimensionality reduction through Principal Component Analysis (PCA) facilitated the visualization and interpretation of the high-dimensional data in two-dimensional space. The k-means algorithm was then applied, revealing five distinct clusters. Each cluster was characterized by examining the prevalence of key terms, uncovering thematic consistencies that may correspond to particular medical procedures or specialties. The resulting clusters demonstrate the algorithm's potential to automatically categorize medical documentation in a way that mirrors clinical relevance, thereby providing a foundation for improved information management systems in healthcare settings.

Original languageEnglish
Title of host publication2024 International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024
EditorsYaser Jararweh, Mohammad Alsmirat, Moayad Aloqaily, Haythem Bany Salameh
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages291-294
Number of pages4
ISBN (Electronic)9798350354690
DOIs
Publication statusPublished - 2024
Event5th International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024 - Dubrovnik, Croatia
Duration: Sept 24 2024Sept 27 2024

Publication series

Name2024 International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024

Conference

Conference5th International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024
Country/TerritoryCroatia
CityDubrovnik
Period9/24/249/27/24

Keywords

  • Dimensionality Reduction
  • K-Means Clustering
  • Medical Transcriptions
  • Unsupervised Learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Communication
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Clustering Medical Transcriptions Using K -Means'. Together they form a unique fingerprint.

Cite this