Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

CIDEr-R: Robust Consensus-based Image Description Evaluation

Cider-R: تقييم الدلالة القائمة على توافق الآراء

725 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

consensus-based image description robust consensus-based image image description evaluation وصف الصورة المستندة إلى توافق الآراء صورة قوية القائمة على توافق في الآراء تقييم الصورة تقييم صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper shows that CIDEr-D, a traditional evaluation metric for image description, does not work properly on datasets where the number of words in the sentence is significantly greater than those in the MS COCO Captions dataset. We also show that CIDEr-D has performance hampered by the lack of multiple reference sentences and high variance of sentence length. To bypass this problem, we introduce CIDEr-R, which improves CIDEr-D, making it more flexible in dealing with datasets with high sentence length variance. We demonstrate that CIDEr-R is more accurate and closer to human judgment than CIDEr-D; CIDEr-R is more robust regarding the number of available references. Our results reveal that using Self-Critical Sequence Training to optimize CIDEr-R generates descriptive captions. In contrast, when CIDEr-D is optimized, the generated captions' length tends to be similar to the reference length. However, the models also repeat several times the same word to increase the sentence length.

References used

https://aclanthology.org/

rate research

Interrater Disagreement Resolution: A Systematic Procedure to Reach Consensus in Annotation Tasks

634 - Association for Computation Linguistics 2021 مقالة

We present a systematic procedure for interrater disagreement resolution. The procedure is general, but of particular use in multiple-annotator tasks geared towards ground truth construction. We motivate our proposal by arguing that, barring cases in which the researchers' goal is to elicit different viewpoints, interrater disagreement is a sign of poor quality in the design or the description of a task. Consensus among annotators, we maintain, should be striven for, through a systematic procedure for disagreement resolution such as the one we describe.

interrater disagreement resolution systematic procedure disagreement resolution قرار خلاف مقاطعة الإجراء المنهجي قرار الخلاف صناعة حمض الفوسفور المزيد..

Category of Ideals in a Ring R

3773 - Damascus University 2011 ورقة بحثية

In this scientific paper we dealt with three different types of homomorphisms between two given ideals in a ring with unity shown as follows: ring homomorphism, R- module homomorphism and ideal homomorphism, which were supported by several example s. Furthermore, we prove that the family of ideals in a ring R with ring, R - module and ideal homomorphisms forms the category of ideals of the first, second and third type, respectively. The next step was dedicated to support all previous ideals by examples and functor between such categories.

التشاكلات المثالية فئة مثاليات حلقة R Ideal homomorphisms Category of ideals in a ring R

RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms

633 - Association for Computation Linguistics 2021 مقالة

Pre-trained language models (PTLMs) have achieved impressive performance on commonsense inference benchmarks, but their ability to employ commonsense to make robust inferences, which is crucial for effective communications with humans, is debated. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA: Robust Inference using Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations. To generate data for this challenge, we develop a systematic and scalable procedure using commonsense knowledge bases and probe PTLMs across two different evaluation settings. Extensive experiments on our generated probe sets with more than 10k statements show that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks. We also find that fine-tuning on similar statements offer limited gains, as PTLMs still fail to generalize to unseen inferences. Our new large-scale benchmark exposes a significant gap between PTLMs and human-level language understanding and offers a new challenge for PTLMs to demonstrate commonsense.

inference capabilities based capabilities based evaluating robust inference إمكانيات الاستدلال القائمة القدرات القائمة تقييم الاستدلال القوي صناعة حمض الفوسفور المزيد..

Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors

735 - Association for Computation Linguistics 2021 مقالة

This paper discusses a classification-based approach to machine translation evaluation, as opposed to a common regression-based approach in the WMT Metrics task. Recent machine translation usually works well but sometimes makes critical errors due to just a few wrong word choices. Our classification-based approach focuses on such errors using several error type labels, for practical machine translation evaluation in an age of neural machine translation. We made additional annotations on the WMT 2015-2017 Metrics datasets with fluency and adequacy labels to distinguish different types of translation errors from syntactic and semantic viewpoints. We present our human evaluation criteria for the corpus development and automatic evaluation experiments using the corpus. The human evaluation corpus will be publicly available upon publication.

translation evaluation focusing تقييم التركيز التركيز صناعة حمض الفوسفور

Time-Efficient Code Completion Model for the R Programming Language

944 - Association for Computation Linguistics 2021 مقالة

In this paper we present a deep learning code completion model for the R language. We introduce several techniques to utilize language modeling based architecture in the code completion task. With these techniques, the model requires low resources, b ut still achieves high quality. We also present an evaluation dataset for the R language completion task. Our dataset contains multiple autocompletion usage contexts that provides robust validation results. The dataset is publicly available.

code completion model time-efficient code completion code completion نموذج إكمال التعليمات البرمجية إكمال رمز الوقت الفعال إكمال الكود صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

CIDEr-R: Robust Consensus-based Image Description Evaluation

Cider-R: تقييم الدلالة القائمة على توافق الآراء

Ask ChatGPT about the research

Read More

suggested questions