New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Interrater Disagreement Resolution: A Systematic Procedure to Reach Consensus in Annotation Tasks

قرار الخلاف الدولي: إجراء منهجي للوصول إلى توافق في الآراء في مهام التوضيحية

339 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

interrater disagreement resolution systematic procedure disagreement resolution قرار خلاف مقاطعة الإجراء المنهجي قرار الخلاف صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a systematic procedure for interrater disagreement resolution. The procedure is general, but of particular use in multiple-annotator tasks geared towards ground truth construction. We motivate our proposal by arguing that, barring cases in which the researchers' goal is to elicit different viewpoints, interrater disagreement is a sign of poor quality in the design or the description of a task. Consensus among annotators, we maintain, should be striven for, through a systematic procedure for disagreement resolution such as the one we describe.

References used

https://aclanthology.org/

rate research

CIDEr-R: Robust Consensus-based Image Description Evaluation

353 - Association for Computation Linguistics 2021 مقالة

This paper shows that CIDEr-D, a traditional evaluation metric for image description, does not work properly on datasets where the number of words in the sentence is significantly greater than those in the MS COCO Captions dataset. We also show that CIDEr-D has performance hampered by the lack of multiple reference sentences and high variance of sentence length. To bypass this problem, we introduce CIDEr-R, which improves CIDEr-D, making it more flexible in dealing with datasets with high sentence length variance. We demonstrate that CIDEr-R is more accurate and closer to human judgment than CIDEr-D; CIDEr-R is more robust regarding the number of available references. Our results reveal that using Self-Critical Sequence Training to optimize CIDEr-R generates descriptive captions. In contrast, when CIDEr-D is optimized, the generated captions' length tends to be similar to the reference length. However, the models also repeat several times the same word to increase the sentence length.

consensus-based image description robust consensus-based image image description evaluation وصف الصورة المستندة إلى توافق الآراء صورة قوية القائمة على توافق في الآراء تقييم الصورة تقييم صناعة حمض الفوسفور المزيد..

Identifying inherent disagreement in natural language inference

267 - Association for Computation Linguistics 2021 مقالة

Natural language inference (NLI) is the task of determining whether a piece of text is entailed, contradicted by or unrelated to another piece of text. In this paper, we investigate how to tease systematic inferences (i.e., items for which people agr ee on the NLI label) apart from disagreement items (i.e., items which lead to different annotations), which most prior work has overlooked. To distinguish systematic inferences from disagreement items, we propose Artificial Annotators (AAs) to simulate the uncertainty in the annotation process by capturing the modes in annotations. Results on the CommitmentBank, a corpus of naturally occurring discourses in English, confirm that our approach performs statistically significantly better than all baselines. We further show that AAs learn linguistic patterns and context-dependent reasoning.

identifying inherent disagreement identifying inherent تحديد الخلافات المتأصلة تحديد الكامنة صناعة حمض الفوسفور

Apples to Apples: A Systematic Evaluation of Topic Models

431 - Association for Computation Linguistics 2021 مقالة

From statistical to neural models, a wide variety of topic modelling algorithms have been proposed in the literature. However, because of the diversity of datasets and metrics, there have not been many efforts to systematically compare their performa nce on the same benchmarks and under the same conditions. In this paper, we present a selection of 9 topic modelling techniques from the state of the art reflecting a diversity of approaches to the task, an overview of the different metrics used to compare their performance, and the challenges of conducting such a comparison. We empirically evaluate the performance of these models on different settings reflecting a variety of real-life conditions in terms of dataset size, number of topics, and distribution of topics, following identical preprocessing and evaluation processes. Using both metrics that rely on the intrinsic characteristics of the dataset (different coherence metrics), as well as external knowledge (word embeddings and ground-truth topic labels), our experiments reveal several shortcomings regarding the common practices in topic models evaluation.

systematic evaluation topic models evaluation apples to apples التقييم المنهجي تقييم نماذج الموضوع التفاح للتفاح صناعة حمض الفوسفور المزيد..

A Kinetic Study of Bleach Materials with a Study to Reach a Low Foam Powder Formula

902 - Damascus University 2012 ورقة بحثية

The purpose of this research is to study the performance of peroxide bleach materials (sodium carbonate peroxy hydrate, sodium perborate mono hydrate, sodium perborate tetra hydrate) within the low foam powder formula, in terms of its rate of disi ntegration with and without an activator, under different temperatures, in order to obtain a high detergency low foam powder formula.

فوق كربونات الصوديوم فوق بورات الصوديوم رباعي أستيل إيتلن ثنائي الأمين القدرة التنظيفية مسحوق منخفض الرغوة (آلي) Sodium carbonate peroxy hydrate Sodium perborate Tetraacetylethylenediamine Detergency low foam powder المزيد..

What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP

436 - Association for Computation Linguistics 2021 مقالة

SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the contributions behind SemEval. By understanding the distribution of task types, metrics, architectures, participation and citations over time we aim to answer the question on what is being evaluated by SemEval.

evaluation campaigns systematic empirical evaluation حملات التقييم التقييم التجريبي المنهجي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Interrater Disagreement Resolution: A Systematic Procedure to Reach Consensus in Annotation Tasks

قرار الخلاف الدولي: إجراء منهجي للوصول إلى توافق في الآراء في مهام التوضيحية

Ask ChatGPT about the research

Read More

suggested questions