Research papers, master and doctoral theses about Machine Translation

Stream-level Latency Evaluation for Simultaneous Machine Translation

772 - Association for Computation Linguistics 2021 مقالة

Simultaneous machine translation has recently gained traction thanks to significant quality improvements and the advent of streaming applications. Simultaneous translation systems need to find a trade-off between translation quality and response time , and with this purpose multiple latency measures have been proposed. However, latency evaluations for simultaneous translation are estimated at the sentence level, not taking into account the sequential nature of a streaming scenario. Indeed, these sentence-level latency measures are not well suited for continuous stream translation, resulting in figures that are not coherent with the simultaneous translation policy of the system being assessed. This work proposes a stream level adaptation of the current latency measures based on a re-segmentation approach applied to the output translation, that is successfully evaluated on streaming conditions for a reference IWSLT task.

simultaneous machine translation stream-level latency evaluation simultaneous machine ترجمة آلية في وقت واحد تقييم كويد مستوى الدفق آلة في وقت واحد صناعة حمض الفوسفور المزيد..

DUTNLP Machine Translation System for WMT21 Triangular Translation Task

914 - Association for Computation Linguistics 2021 مقالة

This paper describes DUT-NLP Lab's submission to the WMT-21 triangular machine translation shared task. The participants are not allowed to use other data and the translation direction of this task is Russian-to-Chinese. In this task, we use the Tran sformer as our baseline model, and integrate several techniques to enhance the performance of the baseline, including data filtering, data selection, fine-tuning, and post-editing. Further, to make use of the English resources, such as Russian/English and Chinese/English parallel data, the relationship triangle is constructed by multilingual neural machine translation systems. As a result, our submission achieves a BLEU score of 21.9 in Russian-to-Chinese.

dutnlp machine translation triangular translation task ترجمة آلة Dutnlp مهمة الترجمة الثلاثية صناعة حمض الفوسفور

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation

857 - Association for Computation Linguistics 2021 مقالة

We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3.02M sentence pairs, which is 2.9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15. We conduct experiments comparing strong neur al baselines and well-known automatic translation engines on our dataset and find that in both automatic and human evaluations: the best performance is obtained by fine-tuning the pre-trained sequence-to-sequence denoising auto-encoder mBART. To our best knowledge, this is the first large-scale Vietnamese-English machine translation study. We hope our publicly available dataset and study can serve as a starting point for future research and applications on Vietnamese-English machine translation. We release our dataset at: https://github.com/VinAIResearch/PhoMT

vietnamese-english machine translation benchmark vietnamese-english machine الترجمة الفيتنامية-الإنجليزية القياس الفيتنامية الآلة الإنجليزية صناعة حمض الفوسفور

Improving Machine Translation of Rare and Unseen Word Senses

643 - Association for Computation Linguistics 2021 مقالة

The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge. Since word senses are not represented uniformly in the parallel corpora used for training, there is an exc essive use of the most frequent sense in MT output. In this work, we propose CmBT (Contextually-mined Back-Translation), an approach for improving multi-sense word translation leveraging pre-trained cross-lingual contextual word representations (CCWRs). Because of their contextual sensitivity and their large pre-training data, CCWRs can easily capture word senses that are missing or very rare in parallel corpora used to train MT. Specifically, CmBT applies bilingual lexicon induction on CCWRs to mine sense-specific target sentences from a monolingual dataset, and then back-translates these sentences to generate a pseudo parallel corpus as additional training data for an MT system. We test the translation quality of ambiguous words on the MuCoW test suite, which was built to test the word sense disambiguation effectiveness of MT systems. We show that our system improves on the translation of difficult unseen and low frequency word senses.

improving machine translation improving machine تحسين ترجمة الآلة تحسين آلة صناعة حمض الفوسفور

Mixup Decoding for Diverse Machine Translation

662 - Association for Computation Linguistics 2021 مقالة

Diverse machine translation aims at generating various target language translations for a given source language sentence. To leverage the linear relationship in the sentence latent space introduced by the mixup training, we propose a novel method, Mi xDiversity, to generate different translations for the input sentence by linearly interpolating it with different sentence pairs sampled from the training corpus during decoding. To further improve the faithfulness and diversity of the translations, we propose two simple but effective approaches to select diverse sentence pairs in the training corpus and adjust the interpolation weight for each pair correspondingly. Moreover, by controlling the interpolation weight, our method can achieve the trade-off between faithfulness and diversity without any additional training, which is required in most of the previous methods. Experiments on WMT'16 en-ro, WMT'14 en-de, and WMT'17 zh-en are conducted to show that our method substantially outperforms all previous diverse machine translation methods.

diverse machine translation diverse machine ترجمة آلة متنوعة آلة متنوعة صناعة حمض الفوسفور

ISTIC's Triangular Machine Translation System for WMT2021

1031 - Association for Computation Linguistics 2021 مقالة

This paper describes the ISTIC's submission to the Triangular Machine Translation Task of Russian-to-Chinese machine translation for WMT' 2021. In order to fully utilize the provided corpora and promote the translation performance from Russian to Chi nese, the pivot method is used in our system which pipelines the Russian-to-English translator and the English-to-Chinese translator to form a Russian-to-Chinese translator. Our system is based on the Transformer architecture and several effective strategies are adopted to improve the quality of translation, including corpus filtering, data pre-processing, system combination and model ensemble.

triangular machine translation istic triangular machine machine translation task ترجمة آلة الثلاثي آلة الثلاثي التغيرية مهمة ترجمة الجهاز صناعة حمض الفوسفور المزيد..

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

663 - Association for Computation Linguistics 2021 مقالة

Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their re liability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be useful, QE systems should be able to detect such errors. However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements. In this work, we bridge this gap by proposing a general methodology for adversarial testing of QE for MT. First, we show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance, thus potentially allowing for comparing QE systems without relying on manual quality annotation.

quality estimation current machine translation buttons تقدير الجودة ترجمة الجهاز الحالي أزرار صناعة حمض الفوسفور المزيد..

The TALP-UPC Participation in WMT21 News Translation Task: an mBART-based NMT Approach

574 - Association for Computation Linguistics 2021 مقالة

This paper describes the submission to the WMT 2021 news translation shared task by the UPC Machine Translation group. The goal of the task is to translate German to French (De-Fr) and French to German (Fr-De). Our submission focuses on fine-tuning a pre-trained model to take advantage of monolingual data. We fine-tune mBART50 using the filtered data, and additionally, we train a Transformer model on the same data from scratch. In the experiments, we show that fine-tuning mBART50 results in 31.69 BLEU for De-Fr and 23.63 BLEU for Fr-De, which increases 2.71 and 1.90 BLEU accordingly, as compared to the model we train from scratch. Our final submission is an ensemble of these two models, further increasing 0.3 BLEU for Fr-De.

mbart-based nmt approach nmt approach upc machine translation نهج NMT مقرها MBART نهج NMT. ترجمة آلة upc صناعة حمض الفوسفور المزيد..

Just Ask! Evaluating Machine Translation by Asking and Answering Questions

748 - Association for Computation Linguistics 2021 مقالة

In this paper, we show that automatically-generated questions and answers can be used to evaluate the quality of Machine Translation (MT) systems. Building on recent work on the evaluation of abstractive text summarization, we propose a new metric for system-level MT evaluation, compare it with other state-of-the-art solutions, and show its robustness by conducting experiments for various MT directions.

evaluating machine translation answering questions تقييم الترجمة الآلية الاجابة عن الاسئلة صناعة حمض الفوسفور

Controlling Machine Translation for Multiple Attributes with Additive Interventions

921 - Association for Computation Linguistics 2021 مقالة

Fine-grained control of machine translation (MT) outputs along multiple attributes is critical for many modern MT applications and is a requirement for gaining users' trust. A standard approach for exerting control in MT is to prepend the input with a special tag to signal the desired output attribute. Despite its simplicity, attribute tagging has several drawbacks: continuous values must be binned into discrete categories, which is unnatural for certain applications; interference between multiple tags is poorly understood. We address these problems by introducing vector-valued interventions which allow for fine-grained control over multiple attributes simultaneously via a weighted linear combination of the corresponding vectors. For some attributes, our approach even allows for fine-tuning a model trained without annotations to support such interventions. In experiments with three attributes (length, politeness and monotonicity) and two language pairs (English to German and Japanese) our models achieve better control over a wider range of tasks compared to tagging, and translation quality does not degrade when no control is requested. Finally, we demonstrate how to enable control in an already trained model after a relatively cheap fine-tuning stage.

controlling machine translation controlling machine جهاز التحكم في الترجمة آلة التحكم صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد