Research papers, master and doctoral theses about similar language

217 - Association for Computation Linguistics 2021 مقالة

We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages. This work is part of our contribution to the WMT 2021 Similar Languages Translation Shared Task where we su bmitted models for different language pairs, including French-Bambara, Spanish-Catalan, and Spanish-Portuguese in both directions. Our models for Catalan-Spanish (82.79 BLEU)and Portuguese-Spanish (87.11 BLEU) rank top 1 in the official shared task evaluation, and we are the only team to submit models for the French-Bambara pairs.

improving similar language similar language translation تحسين لغة مماثلة ترجمة لغوية مماثلة نقل التعلم صناعة حمض الفوسفور

290 - Association for Computation Linguistics 2021 مقالة

This paper describes the SEBAMAT contribution to the 2021 WMT Similar Language Translation shared task. Using the Marian neural machine translation toolkit, translation systems based on Google's transformer architecture were built in both directions of Catalan--Spanish and Portuguese--Spanish. The systems were trained in two contrastive parameter settings (different vocabulary sizes for byte pair encoding) using only the parallel but not the comparable corpora provided by the shared task organizers. According to their official evaluation results, the SEBAMAT system turned out to be competitive with rankings among the top teams and BLEU scores between 38 and 47 for the language pairs involving Portuguese and between 76 and 80 for the language pairs involving Catalan.

مورد لغة مشابه wmt similar language marian nmt WMT لغة مماثلة ماريان NMT صناعة حمض الفوسفور

305 - Association for Computation Linguistics 2021 مقالة

This paper describes the participation of team oneNLP (LTRC, IIIT-Hyderabad) for the WMT 2021 task, similar language translation. We experimented with transformer based Neural Machine Translation and explored the use of language similarity for Tamil- Telugu and Telugu-Tamil. We incorporated use of different subword configurations, script conversion and single model training for both directions as exploratory experiments.

low resource similar resource similar language مورد منخفض مشابه مورد لغة مشابه صناعة حمض الفوسفور

T4T Solution: WMT21 Similar Language Task for the Spanish-Catalan and Spanish-Portuguese Language Pair

234 - Association for Computation Linguistics 2021 مقالة

The main idea of this solution has been to focus on corpus cleaning and preparation and after that, use an out of box solution (OpenNMT) with its default published transformer model. To prepare the corpus, we have used set of standard tools (as Moses scripts or python packages), but also, among other python scripts, a python custom tokenizer with the ability to replace numbers for variables, solve the upper/lower case issue of the vocabulary and provide good segmentation for most of the punctuation. We also have started a line to clean corpus based on statistical probability estimation of source-target corpus, with unclear results. Also, we have run some tests with syllabical word segmentation, again with unclear results, so at the end, after word sentence tokenization we have used BPE SentencePiece for subword units to feed OpenNMT.

similar language task spanish-portuguese language pair similar language مهام لغة مماثلة زوج اللغة البرتغالية الإسبانية لغة مماثلة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد