A hybrid query expansion framework for the optimal retrieval of the biomedical literature

Sumbal Malik, Umar Shoaib, Syed Ahmad Chan Bukhari, Hesham El Sayed, Manzoor Ahmed Khan

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


With the proliferation of biomedical literature, it is quite challenging for biomedical scientists to keep them updated with the new advancements. In biomedical literature retrieval systems, the keywords in the user-defined queries are often defined with various lexical variants consequently leading to the vocabulary mismatch (VM). One possible way to cope with these issues is to introduce a query expansion (QE) framework to enrich the original queries with the auxiliary semantically similar terms for each keyword mentioned in a query. In this research, we propose a biomedical QE framework to alleviate the VM. The proposed approach combines the clinical diagnosis information (CDI) and word embeddings (WEs) simultaneously to retrieve the relevant biomedical literature. The process of embeddings vocabulary terms as real-valued and low dimensional vectors referred to as word embedding has garnered significant attention by potentially capturing the implicit semantics. We have exploited threefold word embeddings (Domain-Specific, Domain-Agnostic, and Hybrid) and integrated the embeddings outcomes with the CDI to get the best query combination for the efficient retrieval of biomedical literature. Experimental results procured for the Text REtrieval Conference dataset showed that CDI, when used with the hybrid word embeddings surpassed the WEs trained for the domain-specific and domain-agnostic data. The results demonstrate that the utilization of this unique setup of merging two techniques is a valuable addition to the QE process leading to significantly improved precision rate and VM in biomedical literature retrieval. We hope that our approach would assist investigators to use this query combination to retrieve relevant articles.

Original languageEnglish
Article number100247
JournalSmart Health
Publication statusPublished - Mar 2022


  • Markov random field
  • Query expansion
  • Semantics
  • Vocabulary mismatch
  • Word embeddings

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Information Systems
  • Health Informatics
  • Computer Science Applications
  • Health Information Management


Dive into the research topics of 'A hybrid query expansion framework for the optimal retrieval of the biomedical literature'. Together they form a unique fingerprint.

Cite this