ﻻ يوجد ملخص باللغة العربية
Attention mechanisms have attracted considerable interest in image captioning because of its powerful performance. Existing attention-based models use feedback information from the caption generator as guidance to determine which of the image features should be attended to. A common defect of these attention generation methods is that they lack a higher-level guiding information from the image itself, which sets a limit on selecting the most informative image features. Therefore, in this paper, we propose a novel attention mechanism, called topic-guided attention, which integrates image topics in the attention model as a guiding information to help select the most important image features. Moreover, we extract image features and image topics with separate networks, which can be fine-tuned jointly in an end-to-end manner during training. The experimental results on the benchmark Microsoft COCO dataset show that our method yields state-of-art performance on various quantitative metrics.
Attention mechanisms are widely used in current encoder/decoder frameworks of image captioning, where a weighted average on encoded vectors is generated at each time step to guide the caption decoding process. However, the decoder has little idea of
Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neural machine translation, and significantly improves the performance. In this paper, we p
We study the problem of weakly supervised grounded image captioning. That is, given an image, the goal is to automatically generate a sentence describing the context of the image with each noun word grounded to the corresponding region in the image.
Image captioning has attracted ever-increasing research attention in the multimedia community. To this end, most cutting-edge works rely on an encoder-decoder framework with attention mechanisms, which have achieved remarkable progress. However, such
Real-time image captioning, along with adequate precision, is the main challenge of this research field. The present work, Multiple Transformers for Self-Attention Mechanism (MTSM), utilizes multiple transformers to address these problems. The propos