International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 2

India | Data Knowledge Engineering | Volume 11 Issue 6, June 2022 | Pages: 1998 - 2002


Multimodal Document Representation for Image-Text Fusion

Akshata Upadhye

Abstract: This survey paper aims to discuss the advancements in the field of multimodal document representation with a specific focus on the fusion of textual and visual information. The overview begins with providing an historical context of multimodal representation techniques, ranging from early hand- crafted feature-based approaches to recent advancements in deep learning. Further the paper explores various strategies used to fuse multimodal information such as concatenation, attention mechanisms, and shared layers. The paper also highlights various applications including image captioning, document retrieval, vi- sual question answering, and multimedia analysis, to demonstrate the broad impact and significance of multimodal representation across diverse domains. Despite the progress made in research and development of advanced techniques, challenges such as data heterogeneity, scalability, and interpretability persist, which open up avenues for future research and development. Finally, the paper offers insights into the current state-of-the-art techniques and identifies opportunities for advancing the field of multimodal document representation.

Keywords: Multimodal Representation, Document Fusion, Image-Text integration, Deep Learning, Information Retrieval, Semantic Understanding



Citation copied to Clipboard!

Rate this Article

5

Characters: 0

Received Comments

No approved comments available.

Rating submitted successfully!


Top