تصف هذه الورقة مشاركة فريق BSC في ترجمة الموارد المنخفضة لغات WMT2021 للمهمة المشتركة بين اللغات الهندية الأوروبية.يهدف النظام إلى حل التتبع الفرعي 2: مقالات التراث الثقافي ويكيبيديا، والتي تنطوي على الترجمة في أربع لغات رومانسية: الكاتالونية والإيطالية والوكر والرومانية.النظام المقدم هو نموذج الترجمة شبه الإشراف متعدد اللغات.إنه يستند إلى نموذج لغة مدرب مسبقا، وهما XLM-Roberta، وهو ما يتم ضبطه في وقت لاحق مع البيانات الموازية التي تم الحصول عليها في الغالب من OPUS.على عكس الأعمال الأخرى، نستخدم XLM فقط لتهيئة التشفير والتهيئة بشكل عشوائي فك تشفير ضحلة.النتائج المبلغ عنها قوية وأداء جيدا لجميع اللغات التي تم اختبارها.
This paper describes the participation of the BSC team in the WMT2021's Multilingual Low-Resource Translation for Indo-European Languages Shared Task. The system aims to solve the Subtask 2: Wikipedia cultural heritage articles, which involves translation in four Romance languages: Catalan, Italian, Occitan and Romanian. The submitted system is a multilingual semi-supervised machine translation model. It is based on a pre-trained language model, namely XLM-RoBERTa, that is later fine-tuned with parallel data obtained mostly from OPUS. Unlike other works, we only use XLM to initialize the encoder and randomly initialize a shallow decoder. The reported results are robust and perform well for all tested languages.
References used
https://aclanthology.org/
This paper describes Charles University sub-mission for Terminology translation shared task at WMT21. The objective of this task is to design a system which translates certain terms based on a provided terminology database, while preserving high over
This paper describes TenTrans' submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance
In this work, we investigate methods for the challenging task of translating between low- resource language pairs that exhibit some level of similarity. In particular, we consider the utility of transfer learning for translating between several Indo-
This paper presents the work in progress toward the creation of a family of WordNets for Sanskrit, Ancient Greek, and Latin. Building on previous attempts in the field, we elaborate these efforts bridging together WordNet relational semantics with th
We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technolo