Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

T4T Solution: WMT21 Similar Language Task for the Spanish-Catalan and Spanish-Portuguese Language Pair

حل T4T: WMT21 مهمة لغة مماثلة لزوج اللغة الإسبانية الكاتالونية والإسبانية والبرتغالية

661 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

similar language task spanish-portuguese language pair similar language مهام لغة مماثلة زوج اللغة البرتغالية الإسبانية لغة مماثلة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The main idea of this solution has been to focus on corpus cleaning and preparation and after that, use an out of box solution (OpenNMT) with its default published transformer model. To prepare the corpus, we have used set of standard tools (as Moses scripts or python packages), but also, among other python scripts, a python custom tokenizer with the ability to replace numbers for variables, solve the upper/lower case issue of the vocabulary and provide good segmentation for most of the punctuation. We also have started a line to clean corpus based on statistical probability estimation of source-target corpus, with unclear results. Also, we have run some tests with syllabical word segmentation, again with unclear results, so at the end, after word sentence tokenization we have used BPE SentencePiece for subword units to feed OpenNMT.

References used

https://aclanthology.org/

rate research

747 - Association for Computation Linguistics 2021 مقالة

This paper describes the SEBAMAT contribution to the 2021 WMT Similar Language Translation shared task. Using the Marian neural machine translation toolkit, translation systems based on Google's transformer architecture were built in both directions of Catalan--Spanish and Portuguese--Spanish. The systems were trained in two contrastive parameter settings (different vocabulary sizes for byte pair encoding) using only the parallel but not the comparable corpora provided by the shared task organizers. According to their official evaluation results, the SEBAMAT system turned out to be competitive with rankings among the top teams and BLEU scores between 38 and 47 for the language pairs involving Portuguese and between 76 and 80 for the language pairs involving Catalan.

مورد لغة مشابه wmt similar language marian nmt WMT لغة مماثلة ماريان NMT صناعة حمض الفوسفور

OffendES: A New Corpus in Spanish for Offensive Language Research

757 - Association for Computation Linguistics 2021 مقالة

Offensive language detection and analysis has become a major area of research in Natural Language Processing. The freedom of participation in social media has exposed online users to posts designed to denigrate, insult or hurt them according to gende r, race, religion, ideology, or other personal characteristics. Focusing on young influencers from the well-known social platforms of Twitter, Instagram, and YouTube, we have collected a corpus composed of 47,128 Spanish comments manually labeled on offensive pre-defined categories. A subset of the corpus attaches a degree of confidence to each label, so both multi-class classification and multi-output regression studies are possible. In this paper, we introduce the corpus, discuss its building process, novelties, and some preliminary experiments with it to serve as a baseline for the research community.

تكامل المعجم offensive language research أبحاث اللغة الهجومية صناعة حمض الفوسفور

NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task

878 - Association for Computation Linguistics 2021 مقالة

In this work, two Neural Machine Translation (NMT) systems have been developed and evaluated as part of the bidirectional Tamil-Telugu similar languages translation subtask in WMT21. The OpenNMT-py toolkit has been used to create quick prototypes of the systems, following which models have been trained on the training datasets containing the parallel corpus and finally the models have been evaluated on the dev datasets provided as part of the task. Both the systems have been trained on a DGX station with 4 -V100 GPUs. The first NMT system in this work is a Transformer based 6 layer encoder-decoder model, trained for 100000 training steps, whose configuration is similar to the one provided by OpenNMT-py and this is used to create a model for bidirectional translation. The second NMT system contains two unidirectional translation models with the same configuration as the first system, with the addition of utilizing Byte Pair Encoding (BPE) for subword tokenization through the pre-trained MultiBPEmb model. Based on the dev dataset evaluation metrics for both the systems, the first system i.e. the vanilla Transformer model has been submitted as the Primary system. Since there were no improvements in the metrics during training of the second system with BPE, it has been submitted as a contrastive system.

ماريان NMT language translation task مهمة ترجمة اللغة صناعة حمض الفوسفور

619 - Association for Computation Linguistics 2021 مقالة

We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages. This work is part of our contribution to the WMT 2021 Similar Languages Translation Shared Task where we su bmitted models for different language pairs, including French-Bambara, Spanish-Catalan, and Spanish-Portuguese in both directions. Our models for Catalan-Spanish (82.79 BLEU)and Portuguese-Spanish (87.11 BLEU) rank top 1 in the official shared task evaluation, and we are the only team to submit models for the French-Bambara pairs.

improving similar language similar language translation تحسين لغة مماثلة ترجمة لغوية مماثلة نقل التعلم صناعة حمض الفوسفور

Divine Justice in the Portuguese sub-plot in Kyd's The Spanish Tragedy

2964 - Tishreen University 2017 ورقة بحثية

This paper is concerned with the theme of divine justice in The Spanish Tragedy (1592) by Thomas Kyd (1558-1594), a famous English Elizabethan dramatist. It first defines the term "didactic", and then moves to discuss the Portuguese playlet in Th e Spanish Tragedy as a miniature play-within-the-play which reveals the issue of divine judgment. This paper concludes by asserting that this play is a tragedy of divine justice and punishment.

العدالة الإلهية المأساة الإسبانية توماس كيد Divine Justice Portuguese sub-plot The Spanish Tragedy Thomas Kyd المسرحية المصغرة البرتغالية المزيد..

T4T Solution: WMT21 Similar Language Task for the Spanish-Catalan and Spanish-Portuguese Language Pair

حل T4T: WMT21 مهمة لغة مماثلة لزوج اللغة الإسبانية الكاتالونية والإسبانية والبرتغالية

Ask ChatGPT about the research

Read More

suggested questions