Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Maastricht University's Large-Scale Multilingual Machine Translation System for WMT 2021

نظام الترجمة متعدد اللغات بجامعة ماستريخت على نطاق واسع ل WMT 2021

1225 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present our development of the multilingual machine translation system for the large-scale multilingual machine translation task at WMT 2021. Starting form the provided baseline system, we investigated several techniques to improve the translation quality on the target subset of languages. We were able to significantly improve the translation quality by adapting the system towards the target subset of languages and by generating synthetic data using the initial model. Techniques successfully applied in zero-shot multilingual machine translation (e.g. similarity regularizer) only had a minor effect on the final translation performance.

References used

https://aclanthology.org/

rate research

Maastricht University's Multilingual Speech Translation System for IWSLT 2021

832 - Association for Computation Linguistics 2021 مقالة

This paper describes Maastricht University's participation in the IWSLT 2021 multilingual speech translation track. The task in this track is to build multilingual speech translation systems in supervised and zero-shot directions. Our primary system is an end-to-end model that performs both speech transcription and translation. We observe that the joint training for the two tasks is complementary especially when the speech translation data is scarce. On the source and target side, we use data augmentation and pseudo-labels respectively to improve the performance of our systems. We also introduce an ensembling technique that consistently improves the quality of transcriptions and translations. The experiments show that the end-to-end system is competitive with its cascaded counterpart especially in zero-shot conditions.

maastricht university multilingual multilingual speech translation university multilingual speech جامعة ماستريخت متعددة اللغات ترجمة خطوة متعددة اللغات خطاب جامعي متعدد اللغات صناعة حمض الفوسفور المزيد..

Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

839 - Association for Computation Linguistics 2021 مقالة

We present the results of the first task on Large-Scale Multilingual Machine Translation. The task consists on the many-to-many evaluation of a single model across a variety of source and target languages. This year, the task consisted on three diffe rent settings: (i) SMALL-TASK1 (Central/South-Eastern European Languages), (ii) the SMALL-TASK2 (South-East Asian Languages), and (iii) FULL-TASK (all 101 x 100 language pairs). All the tasks used the FLORES-101 dataset as the evaluation benchmark. To ensure the longevity of the dataset, the test sets were not publicly released and the models were evaluated in a controlled environment on Dynabench. There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models. This year's result show a significant improvement over the known base-lines with +17.8 BLEU for SMALL-TASK2, +10.6 for FULL-TASK and +3.6 for SMALL-TASK1.

multilingual machine translation large-scale multilingual machine ترجمة الجهاز متعدد اللغات آلة متعددة اللغات على نطاق واسع صناعة حمض الفوسفور

TenTrans Large-Scale Multilingual Machine Translation System for WMT21

860 - Association for Computation Linguistics 2021 مقالة

This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly utilized forward/back-translation, in-domain data selection, knowledge distillation, and gradual fine-tuning from the pre-trained model FLORES-101. We find that forward/back-translation significantly improves the translation results, data selection and gradual fine-tuning are particularly effective during adapting domain, while knowledge distillation brings slight performance improvement. Also, model averaging is used to further improve the translation performance based on these systems. Our final system achieves an average BLEU score of 28.89 across thirty directions on the test set.

مقياس كبير متعدد اللغات tentrans large-scale multilingual tentrans على نطاق واسع متعدد اللغات صناعة حمض الفوسفور

Back-translation for Large-Scale Multilingual Machine Translation

698 - Association for Computation Linguistics 2021 مقالة

This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis t hat a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieve the second place.

متعدد اللغات منخفضة الموارد صناعة حمض الفوسفور

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

905 - Association for Computation Linguistics 2021 مقالة

In this paper, we describe the submission of the joint Samsung Research Philippines-Konvergen AI team for the WMT'21 Large Scale Multilingual Translation Task - Small Track 2. We submit a standard Seq2Seq Transformer model to the shared task without any training or architecture tricks, relying mainly on the strength of our data preprocessing techniques to boost performance. Our final submission model scored 22.92 average BLEU on the FLORES-101 devtest set, and scored 22.97 average BLEU on the contest's hidden test set, ranking us sixth overall. Despite using only a standard Transformer, our model ranked first in Indonesian to Javanese, showing that data preprocessing matters equally, if not more, than cutting edge model architectures and training techniques.

جامعة واسعة النطاق متعدد اللغات scale multilingual translation large scale multilingual مقياس الترجمة متعددة اللغات مقياس كبير متعدد اللغات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Maastricht University's Large-Scale Multilingual Machine Translation System for WMT 2021

نظام الترجمة متعدد اللغات بجامعة ماستريخت على نطاق واسع ل WMT 2021

Ask ChatGPT about the research

Read More

suggested questions