On Long-Tailed Phenomena in Neural Machine Translation

167 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Vikas Raunak

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Vikas Raunak - Siddharth Dalmia - Vivek Gupta

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.

قيم البحث

135 - Yafu Li , Yongjing Yin , Yulong Chen 2021

Modern neural machine translation (NMT) models have achieved competitive performance in standard benchmarks such as WMT. However, there still exist significant issues such as robustness, domain generalization, etc. In this paper, we study NMT models from the perspective of compositional generalization by building a benchmark dataset, CoGnition, consisting of 216k clean and consistent sentence pairs. We quantitatively analyze effects of various factors using compound translation error rate, then demonstrate that the NMT model fails badly on compositional generalization, although it performs remarkably well under traditional metrics.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

WeChat Neural Machine Translation Systems for WMT20

132 - Fandong Meng , Jianhao Yan , Yijin Liu 2020

We participate in the WMT 2020 shared news translation task on Chinese to English. Our system is based on the Transformer (Vaswani et al., 2017a) with effective variants and the DTMT (Meng and Zhang, 2019) architecture. In our experiments, we employ data selection, several synthetic data generation approaches (i.e., back-translation, knowledge distillation, and iterative in-domain knowledge transfer), advanced finetuning approaches and self-bleu based model ensemble. Our constrained Chinese to English system achieves 36.9 case-sensitive BLEU score, which is the highest among all submissions.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

The Missing Ingredient in Zero-Shot Neural Machine Translation

88 - Naveen Arivazhagan , Ankur Bapna , Orhan Firat 2019

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between langua ge pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

The Curious Case of Hallucinations in Neural Machine Translation

123 - Vikas Raunak , Arul Menezes , Marcin Junczys-Dowmunt 2021

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldm an (2020), and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

A Comprehensive Survey of Multilingual Neural Machine Translation