نقدم إجراء منهجي لقرار الخلاف الدولي.الإجراء عام، ولكن الاستخدام بشكل خاص في مهام التعليق المتعددة موجهة نحو بناء الحقيقة الأرضية.نقوم بحفز اقتراحنا من خلال القول بأن هدف الحالات التي يحظر فيها هدف الباحثين هو استنباط نقاط وجهة نظر مختلفة، فإن الخلاف القاطع هو علامة على ضعف الجودة في التصميم أو وصف المهمة.إجماع في الآراء بين المحن المعلقين، نحافظ، يجب أن ينتضل، من خلال إجراء منهجي لحل الخلاف مثل الوصف الذي نصفه.
We present a systematic procedure for interrater disagreement resolution. The procedure is general, but of particular use in multiple-annotator tasks geared towards ground truth construction. We motivate our proposal by arguing that, barring cases in which the researchers' goal is to elicit different viewpoints, interrater disagreement is a sign of poor quality in the design or the description of a task. Consensus among annotators, we maintain, should be striven for, through a systematic procedure for disagreement resolution such as the one we describe.
References used
https://aclanthology.org/
This paper shows that CIDEr-D, a traditional evaluation metric for image description, does not work properly on datasets where the number of words in the sentence is significantly greater than those in the MS COCO Captions dataset. We also show that
Natural language inference (NLI) is the task of determining whether a piece of text is entailed, contradicted by or unrelated to another piece of text. In this paper, we investigate how to tease systematic inferences (i.e., items for which people agr
From statistical to neural models, a wide variety of topic modelling algorithms have been proposed in the literature. However, because of the diversity of datasets and metrics, there have not been many efforts to systematically compare their performa
The purpose of this research is to study the performance of peroxide bleach
materials (sodium carbonate peroxy hydrate, sodium perborate mono hydrate,
sodium perborate tetra hydrate) within the low foam powder formula, in terms
of its rate of disi
SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the