تصف هذه الورقة تقديم Lingua Custodia إلى المهمة المشتركة WMT21 على الترجمة الآلية باستخدام المصطلحات.نحن نعتبر ثلاث اتجاهات، وهي الإنجليزية إلى الفرنسية والروسية والصينية.نحن نعتمد على بنية قائمة على المحولات كمنظمة بناء، ونحن نستكشف طريقة تقدم تغييرتين رئيسيتين على الإجراء القياسي للتعامل مع المصطلحات.أول واحد يتكون في زيادة البيانات التدريبية بطريقة تشجيع النموذج لتعلم سلوك النسخ عند مواجهة مصطلحات قواعد المصطلحات.التغيير الثاني هو عبيد موضعي اخفاء، والغرض منه هو تخفيف التعلم سلوك النسخ وتحسين تعميم النموذج.تظهر النتائج التجريبية أن طريقتنا تلبي معظم قيود المصطلين مع الحفاظ على جودة الترجمة عالية.
This paper describes Lingua Custodia's submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization. Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.
References used
https://aclanthology.org/
Language domains that require very careful use of terminology are abundant and reflect a significant part of the translation industry. In this work we introduce a benchmark for evaluating the quality and consistency of terminology translation, focusi
This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including Zh/En, De/En, Ja/En, Ha/En, Is/En, Hi/Bn, and Xh/Zu in both directions under t
The machine translation efficiency task challenges participants to make their systems faster and smaller with minimal impact on translation quality. How much quality to sacrifice for efficiency depends upon the application, so participants were encou
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between
We present the results of the first task on Large-Scale Multilingual Machine Translation. The task consists on the many-to-many evaluation of a single model across a variety of source and target languages. This year, the task consisted on three diffe