يأخذ هذا العمل إلقاء نظرة حاسمة على تقييم الترجمة التلقائية التي أنشأها المستخدم، والخصائص المعروفة منها رفع العديد من التحديات الخاصة ب MT.تظهر التحليلات لدينا أن قياس الأداء المتوسط للحالة باستخدام متري قياسي على مجموعة اختبار UGC يسقط أقل بكثير من إعطاء صورة موثوقة لجودة الترجمة UGC.هذا هو السبب في أننا نقدم بيانات جديدة تم تعيينها لتقييم ترجمة UGC التي تم فيها تفاح خصوصي خصوصية UGC يدويا باستخدام مصمامة غرامة الحبيبات.باستخدام مجموعة البيانات هذه، نقوم بإجراء العديد من التجارب لقياس تأثير أنواع مختلفة من خصوصيات UGC بجودة الترجمة، أكثر دقة من الممكن في السابق.
This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evaluation of UGC translation in which UGC specificities have been manually annotated using a fine-grained typology. Using this data set, we conduct several experiments to measure the impact of different kinds of UGC specificities on translation quality, more precisely than previously possible.
References used
https://aclanthology.org/
This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definit
Fact Extraction and VERification (FEVER) is a recently introduced task that consists of the following subtasks (i) document retrieval, (ii) sentence retrieval, and (iii) claim verification. In this work, we focus on the subtask of sentence retrieval.
This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics. We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop. In
As hate speech spreads on social media and online communities, research continues to work on its automatic detection. Recently, recognition performance has been increasing thanks to advances in deep learning and the integration of user features. This
The paper presents our submission to the WMT2021 Shared Task on Quality Estimation (QE). We participate in sentence-level predictions of human judgments and post-editing effort. We propose a glass-box approach based on attention weights extracted fro