MedVLM: Medical Vision-Language Model for Consumer Devices

Muhammad Ayaz, Mustaqeem Khan, Muhammad Saqib, Adel Khelifi, Muhammad Sajjad, Abdulmotaleb Elsaddik

Research output: Contribution to specialist publicationArticle

1 Citation (Scopus)

Abstract

Generative Artificial Intelligence (GenAI) has enabled significant advancements in healthcare by supporting complex medical tasks through multimodal data processing. However, existing models often lack the adaptability required for diverse medical applications and are limited by their large size, hindering real-time deployment on consumer and edge devices. This paper presents MedVLM, a novel vision-language model optimized for medical applications such as Visual Question Answering (VQA) and medical report generation. MedVLM integrates the Florence-2 visual model with the LLaMA-2 language model using Low-Rank Adaptation (LoRA), reducing the number of trainable parameters to support efficient, real-time analysis across various imaging modalities, including X-rays, CT scans, and MRIs. Our evaluation includes extensive benchmarking against both specialized (Open-Flamingo, MedVInT, and Med-Flamingo) and generalist (Qwen-VL, PaLM-E) models, with results showing MedVLM's superior performance in diagnostic accuracy and VQA tasks, achieving 0.51% accuracy on the RadVQA dataset. We also validate MedVLM's outputs through collaboration with radiologists, who rated 74 % of its generated medical reports as high quality. This work bridges the gap between GenAI advancements and practical radiological needs, providing a versatile tool that can streamline workflows and enhance diagnostic accuracy across various clinical settings.

Original languageEnglish
Specialist publicationIEEE Consumer Electronics Magazine
DOIs
Publication statusAccepted/In press - 2024
Externally publishedYes

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Hardware and Architecture
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'MedVLM: Medical Vision-Language Model for Consumer Devices'. Together they form a unique fingerprint.

Cite this