New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results

المهمة المشتركة Eval4NLP على تقدير الجودة القادم: نظرة عامة

320 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

explainable quality estimation تقدير الجودة الشرح صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we introduce the Eval4NLP-2021 shared task on explainable quality estimation. Given a source-translation pair, this shared task requires not only to provide a sentence-level score indicating the overall quality of the translation, but also to explain this score by identifying the words that negatively impact translation quality. We present the data, annotation guidelines and evaluation setup of the shared task, describe the six participating systems, and analyze the results. To the best of our knowledge, this is the first shared task on explainable NLP evaluation metrics. Datasets and results are available at https://github.com/eval4nlp/SharedTask2021.

References used

https://aclanthology.org/

rate research

The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results

321 - Association for Computation Linguistics 2021 مقالة

The NLP field has recently seen a substantial increase in work related to reproducibility of results, and more generally in recognition of the importance of having shared definitions and practices relating to evaluation. Much of the work on reproduci bility has so far focused on metric scores, with reproducibility of human evaluation results receiving far less attention. As part of a research programme designed to develop theory and practice of reproducibility assessment in NLP, we organised the first shared task on reproducibility of human evaluations, ReproGen 2021. This paper describes the shared task in detail, summarises results from each of the reproduction studies submitted, and provides further comparative analysis of the results. Out of nine initial team registrations, we received submissions from four teams. Meta-analysis of the four reproduction studies revealed varying degrees of reproducibility, and allowed very tentative first conclusions about what types of evaluation tend to have better reproducibility.

human evaluation results reproducibility نتائج التقييم البشري قابلية اعادة الأنتاج صناعة حمض الفوسفور

Overview and Insights from the SCIVER shared task on Scientific Claim Verification

431 - Association for Computation Linguistics 2021 مقالة

We present an overview of the SCIVER shared task, presented at the 2nd Scholarly Document Processing (SDP) workshop at NAACL 2021. In this shared task, systems were provided a scientific claim and a corpus of research abstracts, and asked to identify which articles Support or Refute the claim as well as provide evidentiary sentences justifying those labels. 11 teams made a total of 14 submissions to the shared task leaderboard, leading to an improvement of more than +23 F1 on the primary task evaluation metric. In addition to surveying the participating systems, we provide several insights into modeling approaches to support continued progress and future research on the important and challenging task of scientific claim verification.

sciver shared task scientific claim verification سكيف مشترك المهمة التحقق العلمي التحقق صناعة حمض الفوسفور

IST-Unbabel 2021 Submission for the Quality Estimation Shared Task

367 - Association for Computation Linguistics 2021 مقالة

We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation. Our team participated on two tasks: Direct Assessment and Post-Editing Effort, encompassing a total of 35 submissions. For all submissions, our ef forts focused on training multilingual models on top of OpenKiwi predictor-estimator architecture, using pre-trained multilingual encoders combined with adapters. We further experiment with and uncertainty-related objectives and features as well as training on out-of-domain direct assessment data.

الاستغلال المباشر صناعة حمض الفوسفور

Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic

317 - Association for Computation Linguistics 2021 مقالة

This paper provides an overview of the WANLP 2021 shared task on sarcasm and sentiment detection in Arabic. The shared task has two subtasks: sarcasm detection (subtask 1) and sentiment analysis (subtask 2). This shared task aims to promote and bring attention to Arabic sarcasm detection, which is crucial to improve the performance in other tasks such as sentiment analysis. The dataset used in this shared task, namely ArSarcasm-v2, consists of 15,548 tweets labelled for sarcasm, sentiment and dialect. We received 27 and 22 submissions for subtasks 1 and 2 respectively. Most of the approaches relied on using and fine-tuning pre-trained language models such as AraBERT and MARBERT. The top achieved results for the sarcasm detection and sentiment analysis tasks were 0.6225 F1-score and 0.748 F1-PN respectively.

arabic sarcasm detection sarcasm detection الكشف عن السخرية العربية الكشف عن السخرية صناعة حمض الفوسفور

Findings of the WMT 2021 Shared Task on Quality Estimation

546 - Association for Computation Linguistics 2021 مقالة

We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels. This edition focused on two main novel additio ns: (i) prediction for unseen languages, i.e. zero-shot settings, and (ii) prediction of sentences with catastrophic errors. In addition, new data was released for a number of languages, especially post-edited data. Participating teams from 19 institutions submitted altogether 1263 systems to different task variants and language pairs.

WMT المهمة الطبية الحيوية task on quality المهمة على الجودة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results

المهمة المشتركة Eval4NLP على تقدير الجودة القادم: نظرة عامة

Ask ChatGPT about the research

Read More

suggested questions