Image caption generation using transfer learning using LSTM and DenseNet

Authors

  • Abdul Jabbar Department of computer science, Riphah International University, I-14 Campus, Islamabad, 44000, Pakistan

DOI:

https://doi.org/10.66108/mna.v4i3.102

Keywords:

Image Captioning, DenseNet, Transfer Learning, Deep Learning

Abstract

Image captioning consists of the description of images by identifying the main objects of an image, the features of the objects, and their associations. The effective system should also produce syntactically and semantically correct sentences. Deep learning methods can be effective in addressing the complications involved in this task. The article presents an advanced deep learning architecture of image captioning that enable the implication of three advanced technologies i.e., machine vision, machine translation and transfer learning. The state-of-the-art CNN architecture have been utilized to perform this task i.e., DenseNet201 model. DenseNet201 is a convolutional neural network (CNN) which converts the image data into a feature vector. After this CNN, a recurrent neural network (RNN) is exploited to encode the images using this vector. The coded text is then passed through another RNN, which is known as Long Short-Term Memory (LSTM) networks where the feature vector is decoded to produce a sequence of words which finally form the image descriptions. The Flickr8k dataset is used to test the effectiveness of the proposed model, and the performance of the model is measured with the help of the BLEU metric, which then gives a quantitative evaluation of the potential of the model.

Downloads

Download data is not yet available.

Additional Files

Published

2025-12-21

How to Cite

Abdul Jabbar. (2025). Image caption generation using transfer learning using LSTM and DenseNet. Machines and Algorithms, 4(3), 178–186. https://doi.org/10.66108/mna.v4i3.102

Issue

Section

Articles

Categories