تم تطوير نماذج الترجمة للمجال المحدد لترجمة بيانات CovID من الإنجليزية إلى الأيرلندية لمهمة LORESMT 2021 المشتركة.تم تطبيق تقنيات التكيف عن المجال، باستخدام كوربوس 55K 55K تكييفها كوفي من المديرية العامة للترجمة.تم مقارنة أداء الدقيقة والضبط الجمنيات المختلطة ومقارنة أساليب البيانات المشتركة مع النماذج المدربة على مجموعة بيانات داخلية ممتدة.كجزء من هذه الدراسة، تم تطوير مجموعة بيانات باللغة الإنجليزية والأيرلندية من البيانات ذات الصلة بالكوفت، من المجالات الصحية والتعليمية.يستخدم نموذج أعلى مستوياته بنية محول مدربة مع مجموعة بيانات Covid داخل المجال.في سياق هذه الدراسة، أظهرنا أن تمديد مجموعة بيانات أساسية 8K داخل المجال من خلال خطوط 5K فقط تحسنت درجة بلو بمقدار 27 نقطة.
Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highestperforming model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points.
References used
https://aclanthology.org/
In this paper, we (team - oneNLP-IIITH) describe our Neural Machine Translation approaches for English-Marathi (both direction) for LoResMT-20211 . We experimented with transformer based Neural Machine Translation and explored the use of different li
Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Machine Translation (MMT) tasks, where the captions are short image descript
We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technolo
We present the University of Central Florida systems for the LoResMT 2021 Shared Task, participating in the English-Irish and English-Marathi translation pairs. We focused our efforts on constrained track of the task, using transfer learning and subw
In this paper, we describe our submissions for LoResMT Shared Task @MT Summit 2021 Conference. We built statistical translation systems in each direction for English ⇐⇒ Marathi language pair. This paper outlines initial baseline experiments with vari