Research papers, master and doctoral theses about image captioning

Visual News: Benchmark and Challenges in News Image Captioning

678 - Association for Computation Linguistics 2021 مقالة

We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more than one million news images along with associated news articles, image captions, author information, and other metadata. Unlike the standard image captioning task, news images depict situations where people, locations, and events are of paramount importance. Our proposed method can effectively combine visual and textual features to generate captions with richer information such as events and entities. More specifically, built upon the Transformer architecture, our model is further equipped with novel multi-modal feature fusion techniques and attention mechanisms, which are designed to generate named entities more accurately. Our method utilizes much fewer parameters while achieving slightly better prediction results than competing methods. Our larger and more diverse Visual News dataset further highlights the remaining challenges in captioning news images.

آلة تفاعلية image captioning task تقسيم الصور المهمة صناعة حمض الفوسفور

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

844 - Association for Computation Linguistics 2021 مقالة

Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training. In spite of significant progress in image captioning with the help of the autoregressive generation framework, current approaches fail to generalize well to novel concept combinations. We propose a new framework that revolves around probing several similar image caption training instances (retrieval), performing analogical reasoning over relevant entities in retrieved prototypes (analogy), and enhancing the generation process with reasoning outcomes (composition). Our method augments the generation model by referring to the neighboring instances in the training set to produce novel concept combinations in generated captions. We perform experiments on the widely used image captioning benchmarks. The proposed models achieve substantial improvement over the compared baselines on both composition-related evaluation metrics and conventional image captioning metrics.

نص كوربوس image captioning systems نظم تسمية الصورة صناعة حمض الفوسفور

Journalistic Guidelines Aware News Image Captioning

803 - Association for Computation Linguistics 2021 مقالة

The task of news article image captioning aims to generate descriptive and informative captions for news article images. Unlike conventional image captions that simply describe the content of the image in general terms, news image captions follow jou rnalistic guidelines and rely heavily on named entities to describe the image content, often drawing context from the whole article they are associated with. In this work, we propose a new approach to this task, motivated by caption guidelines that journalists follow. Our approach, Journalistic Guidelines Aware News Image Captioning (JoGANIC), leverages the structure of captions to improve the generation quality and guide our representation design. Experimental results, including detailed ablation studies, on two large-scale publicly available datasets show that JoGANIC substantially outperforms state-of-the-art methods both on caption generation and named entity related metrics.

journalistic guidelines aware article image captioning المبادئ التوجيهية الصحفية تدرك صورة تقسيم الصورة صناعة حمض الفوسفور

Validity-Based Sampling and Smoothing Methods for Multiple Reference Image Captioning

1216 - Association for Computation Linguistics 2021 مقالة

In image captioning, multiple captions are often provided as ground truths, since a valid caption is not always uniquely determined. Conventional methods randomly select a single caption and treat it as correct, but there have been few effective trai ning methods that utilize multiple given captions. In this paper, we proposed two training technique for making effective use of multiple reference captions: 1) validity-based caption sampling (VBCS), which prioritizes the use of captions that are estimated to be highly valid during training, and 2) weighted caption smoothing (WCS), which applies smoothing only to the relevant words the reference caption to reflect multiple reference captions simultaneously. Experiments show that our proposed methods improve CIDEr by 2.6 points and BLEU4 by 0.9 points from baseline on the MSCOCO dataset.

reference image captioning multiple reference image تقسيم الصورة المرجعية تعليق الصورة صورة مرجعية متعددة صناعة حمض الفوسفور

Image Captioning

1626 - Tishreen University 2018 مشروع تخرج

بناء نظام ذكي يقوم بالتعرف على الأصناف الموجودة في صورة وتوليد توصيف نصي لهذه الأغراض الموجودة في الصورة. استخدمنا الشبكات العصبونية الملتفة Convolutional Neural Networks للقيام بعملية استخلاص الأصناف الموجودة في الصورة، وأدخلنا هذه الأصناف إلى شبكة عصبونية تكرارية Recurrent Neural Network للقيام بعملية توليد التوصيف النصي.

Deep Learning convolutional neural networks recurrent neural networks image captioning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد