Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning

Shahnawaz Ayoub, Yonis Gulzar, Faheem Ahmad Reegu, Sherzod Turaev

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)


Automatic image caption prediction is a challenging task in natural language processing. Most of the researchers have used the convolutional neural network as an encoder and decoder. However, an accurate image caption prediction requires a model to understand the semantic relationship that exists between the various objects present in an image. The attention mechanism performs a linear combination of encoder and decoder states. It emphasizes the semantic information present in the caption with the visual information present in an image. In this paper, we incorporated the Bahdanau attention mechanism with two pre-trained convolutional neural networks—Vector Geometry Group and InceptionV3—to predict the captions of a given image. The two pre-trained models are used as encoders and the Recurrent neural network is used as a decoder. With the help of the attention mechanism, the two encoders are able to provide semantic context information to the decoder and achieve a bilingual evaluation understudy score of 62.5. Our main goal is to compare the performance of the two pre-trained models incorporated with the Bahdanau attention mechanism on the same dataset.

Original languageEnglish
Article number2681
Issue number12
Publication statusPublished - Dec 2022


  • Bahdanau attention mechanism
  • convolutional neural network
  • image captioning
  • natural language process

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Chemistry (miscellaneous)
  • General Mathematics
  • Physics and Astronomy (miscellaneous)


Dive into the research topics of 'Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning'. Together they form a unique fingerprint.

Cite this