Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Pre-gen metrics: Predicting caption quality metrics without generating captions

80 0 0.0 ( 0 )

Download Cite

Added by Marc Tanti

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Marc Tanti - Albert Gatt - Adrian Muscat

Neural and Evolutionary Computing Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics.

rate research

Generating captions without looking beyond objects

122 - Hendrik Heuer , Christof Monz , Arnold W.M. Smeulders 2016

This paper explores new evaluation perspectives for image captioning and introduces a noun translation task that achieves comparative image caption generation performance by translating from a set of nouns to captions. This implies that in image captioning, all word categories other than nouns can be evoked by a powerful language model without sacrificing performance on n-gram precision. The paper also investigates lower and upper bounds of how much individual word categories in the captions contribute to the final BLEU score. A large possible improvement exists for nouns, verbs, and prepositions.

Computer Vision and Pattern Recognition Computation and Language

Towards Generating Stylized Image Captions via Adversarial Training

122 - Omid Mohamad Nezami , Mark Dras , Stephen Wan 2019

While most image captioning aims to generate objective descriptions of images, the last few years have seen work on generating visually grounded image captions which have a specific style (e.g., incorporating positive or negative sentiment). However, because the stylistic component is typically the last part of training, current models usually pay more attention to the style at the expense of accurate content description. In addition, there is a lack of variability in terms of the stylistic aspects. To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist the caption generator to add diverse stylistic components to the generated captions. Because of these components, ATTEND-GAN can generate correlated captions as well as more human-like variability of stylistic patterns. Our system outperforms the state-of-the-art as well as a collection of our baseline models. A linguistic analysis of the generated captions demonstrates that captions generated using ATTEND-GAN have a wider range of stylistic adjectives and adjective-noun pairs.

Computer Vision and Pattern Recognition Computation and Language

Where to put the Image in an Image Caption Generator

104 - Marc Tanti 2017

When a recurrent neural network language model is used for caption generation, the image information can be fed to the neural network either by directly incorporating it in the RNN -- conditioning the language model by `injecting image features -- or in a layer following the RNN -- conditioning the language model by `merging image features. While both options are attested in the literature, there is as yet no systematic comparison between the two. In this paper we empirically show that it is not especially detrimental to performance whether one architecture is used or another. The merge architecture does have practical advantages, as conditioning by merging allows the RNNs hidden state vector to shrink in size by up to four times. Our results suggest that the visual and linguistic modalities for caption generation need not be jointly encoded by the RNN as that yields large, memory-intensive models with few tangible advantages in performance; rather, the multimodal integration should be delayed to a subsequent stage.

Neural and Evolutionary Computing Computation and Language Computer Vision and Pattern Recognition

Do We Need Improved Code Quality Metrics?

197 - Tushar Sharma , Diomidis Spinellis 2020

The software development community has been using code quality metrics for the last five decades. Despite their wide adoption, code quality metrics have attracted a fair share of criticism. In this paper, first, we carry out a qualitative exploration by surveying software developers to gauge their opinions about current practices and potential gaps with the present set of metrics. We identify deficiencies including lack of soundness, i.e., the ability of a metric to capture a notion accurately as promised by the metric, lack of support for assessing software architecture quality, and insufficient support for assessing software testing and infrastructure. In the second part of the paper, we focus on one specific code quality metric-LCOM as a case study to explore opportunities towards improved metrics. We evaluate existing LCOM algorithms qualitatively and quantitatively to observe how closely they represent the concept of cohesion. In this pursuit, we first create eight diverse cases that any LCOM algorithm must cover and obtain their cohesion levels by a set of experienced developers and consider them as a ground truth. We show that the present set of LCOM algorithms do poorly w.r.t. these cases. To bridge the identified gap, we propose a new approach to compute LCOM and evaluate the new approach with the ground truth. We also show, using a quantitative analysis using more than 90 thousand types belonging to 261 high-quality Java repositories, the present set of methods paint a very inaccurate and misleading picture of class cohesion. We conclude that the current code quality metrics in use suffer from various deficiencies, presenting ample opportunities for the research community to address the gaps.

Software Engineering

Quantifying the amount of visual information used by neural caption generators

336 - Marc Tanti , Albert Gatt , Kenneth P. Camilleri 2018

This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI.

Neural and Evolutionary Computing Computation and Language

comments

Fetching comments

King AbdulAziz University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Pre-gen metrics: Predicting caption quality metrics without generating captions

Ask ChatGPT about the research

No Arabic abstract

Read More