Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

مسائل معالجة البيانات: نظام الترجمة SRPH-KONVERGEN AI ل WMT'21

526 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we describe the submission of the joint Samsung Research Philippines-Konvergen AI team for the WMT'21 Large Scale Multilingual Translation Task - Small Track 2. We submit a standard Seq2Seq Transformer model to the shared task without any training or architecture tricks, relying mainly on the strength of our data preprocessing techniques to boost performance. Our final submission model scored 22.92 average BLEU on the FLORES-101 devtest set, and scored 22.97 average BLEU on the contest's hidden test set, ranking us sixth overall. Despite using only a standard Transformer, our model ranked first in Indonesian to Javanese, showing that data preprocessing matters equally, if not more, than cutting edge model architectures and training techniques.

References used

https://aclanthology.org/

rate research

Maastricht University's Large-Scale Multilingual Machine Translation System for WMT 2021

742 - Association for Computation Linguistics 2021 مقالة

We present our development of the multilingual machine translation system for the large-scale multilingual machine translation task at WMT 2021. Starting form the provided baseline system, we investigated several techniques to improve the translation quality on the target subset of languages. We were able to significantly improve the translation quality by adapting the system towards the target subset of languages and by generating synthetic data using the initial model. Techniques successfully applied in zero-shot multilingual machine translation (e.g. similarity regularizer) only had a minor effect on the final translation performance.

متعدد اللغات منخفضة الموارد maastricht university large-scale university large-scale multilingual جامعة ماستريخت واسعة النطاق جامعة واسعة النطاق متعدد اللغات صناعة حمض الفوسفور

The Mininglamp Machine Translation System for WMT21

509 - Association for Computation Linguistics 2021 مقالة

This paper describes Mininglamp neural machine translation systems of the WMT2021 news translation tasks. We have participated in eight directions translation tasks for news text including Chinese to/from English, Hausa to/from English, German to/fro m English and French to/from German. Our fundamental system was based on Transformer architecture, with wider or smaller construction for different news translation tasks. We mainly utilized the method of back-translation, knowledge distillation and fine-tuning to boost single model, while the ensemble was used to combine single models. Our final submission has ranked first for the English to/from Hausa task.

mininglamp machine translation machine translation system mininglamp neural machine ترجمة آلة MiningLamp. نظام الترجمة الآلية mininglamp الآلة العصبية صناعة حمض الفوسفور المزيد..

ISTIC's Triangular Machine Translation System for WMT2021

587 - Association for Computation Linguistics 2021 مقالة

This paper describes the ISTIC's submission to the Triangular Machine Translation Task of Russian-to-Chinese machine translation for WMT' 2021. In order to fully utilize the provided corpora and promote the translation performance from Russian to Chi nese, the pivot method is used in our system which pipelines the Russian-to-English translator and the English-to-Chinese translator to form a Russian-to-Chinese translator. Our system is based on the Transformer architecture and several effective strategies are adopted to improve the quality of translation, including corpus filtering, data pre-processing, system combination and model ensemble.

triangular machine translation istic triangular machine machine translation task ترجمة آلة الثلاثي آلة الثلاثي التغيرية مهمة ترجمة الجهاز صناعة حمض الفوسفور المزيد..

TenTrans Large-Scale Multilingual Machine Translation System for WMT21

516 - Association for Computation Linguistics 2021 مقالة

This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly utilized forward/back-translation, in-domain data selection, knowledge distillation, and gradual fine-tuning from the pre-trained model FLORES-101. We find that forward/back-translation significantly improves the translation results, data selection and gradual fine-tuning are particularly effective during adapting domain, while knowledge distillation brings slight performance improvement. Also, model averaging is used to further improve the translation performance based on these systems. Our final system achieves an average BLEU score of 28.89 across thirty directions on the test set.

مقياس كبير متعدد اللغات tentrans large-scale multilingual tentrans على نطاق واسع متعدد اللغات صناعة حمض الفوسفور

ANVITA Machine Translation System for WAT 2021 MultiIndicMT Shared Task

352 - Association for Computation Linguistics 2021 مقالة

This paper describes ANVITA-1.0 MT system, architected for submission to WAT2021 MultiIndicMT shared task by mcairt team, where the team participated in 20 translation directions: English→Indic and Indic→English; Indic set comprised of 10 Indian lang uages. ANVITA-1.0 MT system comprised of two multi-lingual NMT models one for the English→Indic directions and other for the Indic→English directions with shared encoder-decoder, catering 10 language pairs and twenty translation directions. The base models were built based on Transformer architecture and trained over MultiIndicMT WAT 2021 corpora and further employed back translation and transliteration for selective data augmentation, and model ensemble for better generalization. Additionally, MultiIndicMT WAT 2021 corpora was distilled using a series of filtering operations before putting up for training. ANVITA-1.0 achieved highest AM-FM score for English→Bengali, 2nd for English→Tamil and 3rd for English→Hindi, Bengali→English directions on official test set. In general, performance achieved by ANVITA for the Indic→English directions are relatively better than that of English→Indic directions for all the 10 language pairs when evaluated using BLEU and RIBES, although the same trend is not observed consistently when AM-FM based evaluation was carried out. As compared to BLEU, RIBES and AM-FM based scoring placed ANVITA relatively better among all the task participants.

Instan-English متعددة اللغات العصبية anvita machine translation Anvita ترجمة آلة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

مسائل معالجة البيانات: نظام الترجمة SRPH-KONVERGEN AI ل WMT'21

Ask ChatGPT about the research

Read More

suggested questions