Abstract
Generative Artificial Intelligence (GenAI) has enabled significant advancements in healthcare by supporting complex medical tasks through multimodal data processing. However, existing models often lack the adaptability required for diverse medical applications and are limited by their large size, hindering real-time deployment on consumer and edge devices. This paper presents MedVLM, a novel vision-language model optimized for medical applications such as Visual Question Answering (VQA) and medical report generation. MedVLM integrates the Florence-2 visual model with the LLaMA-2 language model using Low-Rank Adaptation (LoRA), reducing the number of trainable parameters to support efficient, real-time analysis across various imaging modalities, including X-rays, CT scans, and MRIs. Our evaluation includes extensive benchmarking against both specialized (Open-Flamingo, MedVInT, and Med-Flamingo) and generalist (Qwen-VL, PaLM-E) models, with results showing MedVLM's superior performance in diagnostic accuracy and VQA tasks, achieving 0.51% accuracy on the RadVQA dataset. We also validate MedVLM's outputs through collaboration with radiologists, who rated 74 % of its generated medical reports as high quality. This work bridges the gap between GenAI advancements and practical radiological needs, providing a versatile tool that can streamline workflows and enhance diagnostic accuracy across various clinical settings.
Original language | English |
---|---|
Specialist publication | IEEE Consumer Electronics Magazine |
DOIs | |
Publication status | Accepted/In press - 2024 |
Externally published | Yes |
ASJC Scopus subject areas
- Human-Computer Interaction
- Hardware and Architecture
- Computer Science Applications
- Electrical and Electronic Engineering