تؤكد الدراسات الحديثة على حاجة إلى سياق وثائق في التقييم البشري لترجمات الماكينة، لكن القليل من الأبحاث قد تم في تأثير واجهات المستخدم على الإنتاجية العنصرية وموثوقية التقييمات.في هذا العمل، نقوم بمقارنة بيانات التقييم البشري من أحدث حملتين تقييمين من WMT التي تم جمعها عبر طريقتين مختلفتين لتقييم مستوى المستندات.يوضح تحليلنا أن اتباع نهج تركز على المستندات في التقييم حيث يتم عرض العنصي مع سياق المستند بأكمله على الشاشة يؤدي إلى تقييمات أعلى جودة ومستوى المستندات.إنه يحسن الارتباط بين القطاع وعشرات المستندات ويزيد من اتفاقية المشتركة بين النقاط عن درجات الوثائق ولكنها أكثر بكثير من الوقت المستهلكة للمعجبين.
Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this work, we compare human assessment data from the last two WMT evaluation campaigns collected via two different methods for document-level evaluation. Our analysis shows that a document-centric approach to evaluation where the annotator is presented with the entire document context on a screen leads to higher quality segment and document level assessments. It improves the correlation between segment and document scores and increases inter-annotator agreement for document scores but is considerably more time consuming for annotators.
References used
https://aclanthology.org/
This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis t
Recently a number of approaches have been proposed to improve translation performance for document-level neural machine translation (NMT). However, few are focusing on the subject of lexical translation consistency. In this paper we apply one transla
Abstract Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions. While there has been considerable research
This work investigates neural machine translation (NMT) systems for translating English user reviews into Croatian and Serbian, two similar morphologically complex languages. Two types of reviews are used for testing the systems: IMDb movie reviews a
This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly