ﻻ يوجد ملخص باللغة العربية
What is an effective expression that draws laughter from human beings? In the present paper, in order to consider this question from an academic standpoint, we generate an image caption that draws a laugh by a computer. A system that outputs funny captions based on the image caption proposed in the computer vision field is constructed. Moreover, we also propose the Funny Score, which flexibly gives weights according to an evaluation database. The Funny Score more effectively brings out laughter to optimize a model. In addition, we build a self-collected BoketeDB, which contains a theme (image) and funny caption (text) posted on Bokete, which is an image Ogiri website. In an experiment, we use BoketeDB to verify the effectiveness of the proposed method by comparing the results obtained using the proposed method and those obtained using MS COCO Pre-trained CNN+LSTM, which is the baseline and idiot created by humans. We refer to the proposed method, which uses the BoketeDB pre-trained model, as the Neural Joking Machine (NJM).
The last decade has witnessed remarkable progress in the image captioning task; however, most existing methods cannot control their captions, emph{e.g.}, choosing to describe the image either roughly or in detail. In this paper, we propose to use a s
Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective characterisation o
Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming on
The mainstream image captioning models rely on Convolutional Neural Network (CNN) image features to generate captions via recurrent models. Recently, image scene graphs have been used to augment captioning models so as to leverage their structural se
Describing images using natural language is widely known as image captioning, which has made consistent progress due to the development of computer vision and natural language generation techniques. Though conventional captioning models achieve high