Downloads: 4

Saudi Arabia | Computer Science | Volume 14 Issue 4, April 2025 | Pages: 2470 - 2473

An Improved Image Captioning Approach

Raghad Al-Misned, Mohammed Al-Hagery

Abstract: Image captioning and the task of automatically generating descriptive captions for images has gained significant attention in recent years due to its wide-ranging applications. These applications include; content accessibility, content indexing, and automated content generation. This proposal will help explore the intersection of Language-Image Pre-training approaches and image captioning, aiming to develop a robust model capable of generating accurate and contextually relevant captions for diverse images. This is achieved by leveraging the power of a state-of-the-art vision-language pre-training model. The proposed approach aims to set new benchmarks in image captioning by leveraging innovative techniques and advancing the integration of vision and language models. We propose a novel architecture and methodology, including advanced attention mechanisms and multimodal fusion techniques, to enhance captioning performance and improve the understanding of visual content by machines. This can be achieved through comprehensive experimentation and evaluation on benchmark datasets, to demonstrate the effectiveness and practical utility of our proposed approach. Our findings will not only contribute to the advancement of image captioning technology by combining a pre-trained vision-language model with innovative strategies like Parameter-Efficient Fine-Tuning (PEFT), and it will hold significant implications for various real-world applications, including assistive technology, content indexing, and automated content generation in a number of different domains.

Keywords: Natural Language Processing, Computer Vision, Image Captioning, Deep Learning

Rate This Article!

Received Comments

No approved comments available.