في هذه الورقة، نقدم نظام NICT (NICT-2) المقدم إلى المهمة المشتركة NICT-SAP في ورشة العمل الثامنة حول الترجمة الآسيوية (WAT-2021).ميزة نظامنا هي أننا استخدمنا بارت بعدة اللغات المسبقة (محول تراجع ثنائي الاتجاه وتراجع تلقائي؛ نموذج mbart).نظرا لأن النماذج المتاحة للجمهور لا تدعم بعض اللغات في مهمة NIST-SAP، أضفنا هذه اللغات إلى نموذج MBART ثم تدربها باستخدام Orgy Corpora المستخرجة من Wikipedia.نحن نضقل النموذج MBART الموسع باستخدام Corpora الموازي المحدد بواسطة مهمة NIST-SAP.تحسنت درجات بلو بشكل كبير مقارنة بتلك الأنظمة دون النموذج المحدد، بما في ذلك اللغات الإضافية.
In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021). A feature of our system is that we used a pretrained multilingual BART (Bidirectional and Auto-Regressive Transformer; mBART) model. Because publicly available models do not support some languages in the NICT-SAP task, we added these languages to the mBART model and then trained it using monolingual corpora extracted from Wikipedia. We fine-tuned the expanded mBART model using the parallel corpora specified by the NICT-SAP task. The BLEU scores greatly improved in comparison with those of systems without the pretrained model, including the additional languages.
References used
https://aclanthology.org/
Multilingual T5 pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (mT6).
This paper describes TenTrans' submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance
In this paper, we present the details of the systems that we have submitted for the WAT 2021 MultiIndicMT: An Indic Language Multilingual Task. We have submitted two separate multilingual NMT models: one for English to 10 Indic languages and another
This paper describes ANVITA-1.0 MT system, architected for submission to WAT2021 MultiIndicMT shared task by mcairt team, where the team participated in 20 translation directions: English→Indic and Indic→English; Indic set comprised of 10 Indian lang
Neural Machine Translation (NMT) is a predominant machine translation technology nowadays because of its end-to-end trainable flexibility. However, NMT still struggles to translate properly in low-resource settings specifically on distant language pa