Dual Reconstruction: a Unifying Objective for Semi-Supervised Neural Machine Translation

289 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Weijia Xu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Weijia Xu - Xing Niu - Marine Carpuat

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

While Iterative Back-Translation and Dual Learning effectively incorporate monolingual training data in neural machine translation, they use different objectives and heuristic gradient approximation strategies, and have not been extensively compared. We introduce a novel dual reconstruction objective that provides a unified view of Iterative Back-Translation and Dual Learning. It motivates a theoretical analysis and controlled empirical study on German-English and Turkish-English tasks, which both suggest that Iterative Back-Translation is more effective than Dual Learning despite its relative simplicity.

قيم البحث

100 - Sreyashi Nag , Mihir Kale , Varun Lakshminarasimhan 2020

We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-tran slation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.

الحساب واللغة

Sockeye: A Toolkit for Neural Machine Translation

486 - Felix Hieber , Tobias Domhan , Michael Denkowski 2017

We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Writte n in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks. Sockeye also supports a wide range of optimizers, normalization and regularization techniques, and inference improvements from current NMT literature. Users can easily run standard training recipes, explore different model settings, and incorporate new ideas. In this paper, we highlight Sockeyes features and benchmark it against other NMT toolkits on two language arcs from the 2017 Conference on Machine Translation (WMT): English-German and Latvian-English. We report competitive BLEU scores across all three architectures, including an overall best score for Sockeyes transformer implementation. To facilitate further comparison, we release all system outputs and training scripts used in our experiments. The Sockeye toolkit is free software released under the Apache 2.0 license.

الحساب واللغة التعلم الآلي التعلم الالي

Controllable Dual Skew Divergence Loss for Neural Machine Translation

70 - Zuchao Li , Hai Zhao , Yingting Wu 2019

In sequence prediction tasks like neural machine translation, training with cross-entropy loss often leads to models that overgeneralize and plunge into local optima. In this paper, we propose an extended loss function called emph{dual skew divergenc e} (DSD) that integrates two symmetric terms on KL divergences with a balanced weight. We empirically discovered that such a balanced weight plays a crucial role in applying the proposed DSD loss into deep models. Thus we eventually develop a controllable DSD loss for general-purpose scenarios. Our experiments indicate that switching to the DSD loss after the convergence of ML training helps models escape local optima and stimulates stable performance improvements. Our evaluations on the WMT 2014 English-German and English-French translation tasks demonstrate that the proposed loss as a general and convenient mean for NMT training indeed brings performance improvement in comparison to strong baselines.

الحساب واللغة

Reciprocal Supervised Learning Improves Neural Machine Translation

88 - Minkai Xu , Mingxuan Wang , Zhouhan Lin 2020

Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT). This is mainly due to the compositionality of the target space, where the far-a way prediction hypotheses lead to the notorious reinforced mistake problem. In this paper, we revisit the utilization of multiple diverse models and present a simple yet effective approach named Reciprocal-Supervised Learning (RSL). RSL first exploits individual models to generate pseudo parallel data, and then cooperatively trains each model on the combined synthetic corpus. RSL leverages the fact that different parameterized models have different inductive biases, and better predictions can be made by jointly exploiting the agreement among each other. Unlike the previous knowledge distillation methods built upon a much stronger teacher, RSL is capable of boosting the accuracy of one model by introducing other comparable or even weaker models. RSL can also be viewed as a more efficient alternative to ensemble. Extensive experiments demonstrate the superior performance of RSL on several benchmarks with significant margins.

الحساب واللغة الذكاء الاصطناعي

Explicit Reordering for Neural Machine Translation

139 - Kehai Chen , Rui Wang , Masao Utiyama 2020

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve state-of-the-art result s for various translation tasks. However, Transformer-based NMT only adds representations of positions sequentially to word vectors in the input sentence and does not explicitly consider reordering information in this sentence. In this paper, we first empirically investigate the relationship between source reordering information and translation performance. The empirical findings show that the source input with the target order learned from the bilingual parallel dataset can substantially improve translation performance. Thus, we propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT. The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.

الحساب واللغة التعلم الآلي