Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Transfer Learning with Shallow Decoders: BSC at WMT2021's Multilingual Low-Resource Translation for Indo-European Languages Shared Task

نقل التعلم مع وحدة فك التشفير الضحلة: BSC في الترجمة ذات الموارد المنخفضة لغات WMT2021 للمهمة المشتركة لغات الهند الأوروبية

545 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

المهام المشتركة لغات صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes the participation of the BSC team in the WMT2021's Multilingual Low-Resource Translation for Indo-European Languages Shared Task. The system aims to solve the Subtask 2: Wikipedia cultural heritage articles, which involves translation in four Romance languages: Catalan, Italian, Occitan and Romanian. The submitted system is a multilingual semi-supervised machine translation model. It is based on a pre-trained language model, namely XLM-RoBERTa, that is later fine-tuned with parallel data obtained mostly from OPUS. Unlike other works, we only use XLM to initialize the encoder and randomly initialize a shallow decoder. The reported results are robust and perform well for all tested languages.

References used

https://aclanthology.org/

rate research

CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task

1118 - Association for Computation Linguistics 2021 مقالة

This paper describes Charles University sub-mission for Terminology translation shared task at WMT21. The objective of this task is to design a system which translates certain terms based on a provided terminology database, while preserving high over all translation quality. We competed in English-French language pair. Our approach is based on providing the desired translations alongside the input sentence and training the model to use these provided terms. We lemmatize the terms both during the training and inference, to allow the model to learn how to produce correct surface forms of the words, when they differ from the forms provided in the terminology database.

multilingual low-resource translation indo-european languages shared languages shared task الترجمة متعددة اللغات منخفضة الموارد اللغات الهندية الأوروبية مشتركة المهام المشتركة لغات صناعة حمض الفوسفور المزيد..

TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task

659 - Association for Computation Linguistics 2021 مقالة

This paper describes TenTrans' submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.

متعددة اللغات NMT. indo-european languages task multilingual low-resource مهمة اللغات الهندية الأوروبية متعدد اللغات منخفضة الموارد صناعة حمض الفوسفور

Machine Translation of Low-Resource Indo-European Languages

1172 - Association for Computation Linguistics 2021 مقالة

In this work, we investigate methods for the challenging task of translating between low- resource language pairs that exhibit some level of similarity. In particular, we consider the utility of transfer learning for translating between several Indo- European low-resource languages from the Germanic and Romance language families. In particular, we build two main classes of transfer-based systems to study how relatedness can benefit the translation performance. The primary system fine-tunes a model pre-trained on a related language pair and the contrastive system fine-tunes one pre-trained on an unrelated language pair. Our experiments show that although relatedness is not necessary for transfer learning to work, it does benefit model performance.

مهمة الترجمة الثلاثية germanic and romance الجرمانية والرومانسية صناعة حمض الفوسفور

Toward the creation of WordNets for ancient Indo-European languages

767 - Association for Computation Linguistics 2021 مقالة

This paper presents the work in progress toward the creation of a family of WordNets for Sanskrit, Ancient Greek, and Latin. Building on previous attempts in the field, we elaborate these efforts bridging together WordNet relational semantics with th eories of meaning from Cognitive Linguistics. We discuss some of the innovations we have introduced to the WordNet architecture, to better capture the polysemy of words, as well as Indo-European language family-specific features. We conclude the paper framing our work within the larger picture of resources available for ancient languages and showing that WordNet-backed search tools have the potential to re-define the kinds of questions that can be asked of ancient language corpora.

ancient indo-european languages ancient اللغات الهندية القديمة الأوروبية عتيق صناعة حمض الفوسفور

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

892 - Association for Computation Linguistics 2021 مقالة

We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technolo gies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English↔Irish, English↔Marathi, and Taiwanese Sign language↔Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English↔Irish while five teams submitted systems for English↔Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language↔Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and 31.3 for Marathi--English.

taiwanese sign language لغة علامة تايوانية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Transfer Learning with Shallow Decoders: BSC at WMT2021's Multilingual Low-Resource Translation for Indo-European Languages Shared Task

نقل التعلم مع وحدة فك التشفير الضحلة: BSC في الترجمة ذات الموارد المنخفضة لغات WMT2021 للمهمة المشتركة لغات الهند الأوروبية

Ask ChatGPT about the research

Read More

suggested questions