TY - GEN
T1 - MedVLM
T2 - Medical Vision-Language Model for Consumer Devices
AU - Ayaz, Muhammad
AU - Khan, Mustaqeem
AU - Saqib, Muhammad
AU - Khelifi, Adel
AU - Sajjad, Muhammad
AU - Elsaddik, Abdulmotaleb
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Generative Artificial Intelligence (GenAI) has enabled significant advancements in healthcare by supporting complex medical tasks through multimodal data processing. However, existing models often lack the adaptability required for diverse medical applications and are limited by their large size, hindering real-time deployment on consumer and edge devices. This paper presents MedVLM, a novel vision-language model optimized for medical applications such as Visual Question Answering (VQA) and medical report generation. MedVLM integrates the Florence-2 visual model with the LLaMA-2 language model using Low-Rank Adaptation (LoRA), reducing the number of trainable parameters to support efficient, real-time analysis across various imaging modalities, including X-rays, CT scans, and MRIs. Our evaluation includes extensive benchmarking against both specialized (Open-Flamingo, MedVInT, and Med-Flamingo) and generalist (Qwen-VL, PaLM-E) models, with results showing MedVLM's superior performance in diagnostic accuracy and VQA tasks, achieving 0.51% accuracy on the RadVQA dataset. We also validate MedVLM's outputs through collaboration with radiologists, who rated 74 % of its generated medical reports as high quality. This work bridges the gap between GenAI advancements and practical radiological needs, providing a versatile tool that can streamline workflows and enhance diagnostic accuracy across various clinical settings.
AB - Generative Artificial Intelligence (GenAI) has enabled significant advancements in healthcare by supporting complex medical tasks through multimodal data processing. However, existing models often lack the adaptability required for diverse medical applications and are limited by their large size, hindering real-time deployment on consumer and edge devices. This paper presents MedVLM, a novel vision-language model optimized for medical applications such as Visual Question Answering (VQA) and medical report generation. MedVLM integrates the Florence-2 visual model with the LLaMA-2 language model using Low-Rank Adaptation (LoRA), reducing the number of trainable parameters to support efficient, real-time analysis across various imaging modalities, including X-rays, CT scans, and MRIs. Our evaluation includes extensive benchmarking against both specialized (Open-Flamingo, MedVInT, and Med-Flamingo) and generalist (Qwen-VL, PaLM-E) models, with results showing MedVLM's superior performance in diagnostic accuracy and VQA tasks, achieving 0.51% accuracy on the RadVQA dataset. We also validate MedVLM's outputs through collaboration with radiologists, who rated 74 % of its generated medical reports as high quality. This work bridges the gap between GenAI advancements and practical radiological needs, providing a versatile tool that can streamline workflows and enhance diagnostic accuracy across various clinical settings.
UR - https://www.scopus.com/pages/publications/85213680470
UR - https://www.scopus.com/pages/publications/85213680470#tab=citedBy
U2 - 10.1109/MCE.2024.3522521
DO - 10.1109/MCE.2024.3522521
M3 - Article
AN - SCOPUS:85213680470
SN - 2162-2248
JO - IEEE Consumer Electronics Magazine
JF - IEEE Consumer Electronics Magazine
ER -