في هذه الورقة، نقدم المهمة المشتركة ESPR4NLP-2021 على تقدير الجودة القادم.بالنظر إلى زوج ترجمة من المصدر، فإن هذه المهمة المشتركة لا تتطلب فقط توفير درجة على مستوى الجملة تشير إلى الجودة الشاملة للترجمة، ولكن أيضا لشرح هذه النقاط عن طريق تحديد الكلمات التي تؤثر سلبا على جودة الترجمة.نقدم البيانات وإرشادات التوضيحية وإعداد تقييم المهمة المشتركة، وصف النظم الستة المشاركة وتحليل النتائج.إلى حد ما من معرفتنا، هذه هي المهمة المشتركة الأولى على مقاييس تقييم NLP القابلة للتفسير.تتوفر مجموعات البيانات والنتائج في https://github.com/eval4nlp/sharedtask2021.
In this paper, we introduce the Eval4NLP-2021 shared task on explainable quality estimation. Given a source-translation pair, this shared task requires not only to provide a sentence-level score indicating the overall quality of the translation, but also to explain this score by identifying the words that negatively impact translation quality. We present the data, annotation guidelines and evaluation setup of the shared task, describe the six participating systems, and analyze the results. To the best of our knowledge, this is the first shared task on explainable NLP evaluation metrics. Datasets and results are available at https://github.com/eval4nlp/SharedTask2021.
References used
https://aclanthology.org/
The NLP field has recently seen a substantial increase in work related to reproducibility of results, and more generally in recognition of the importance of having shared definitions and practices relating to evaluation. Much of the work on reproduci
We present an overview of the SCIVER shared task, presented at the 2nd Scholarly Document Processing (SDP) workshop at NAACL 2021. In this shared task, systems were provided a scientific claim and a corpus of research abstracts, and asked to identify
We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation. Our team participated on two tasks: Direct Assessment and Post-Editing Effort, encompassing a total of 35 submissions. For all submissions, our ef
This paper provides an overview of the WANLP 2021 shared task on sarcasm and sentiment detection in Arabic. The shared task has two subtasks: sarcasm detection (subtask 1) and sentiment analysis (subtask 2). This shared task aims to promote and bring
We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels. This edition focused on two main novel additio