Do you want to publish a course? Click here

Multi-Domain Adaptation in Neural Machine Translation Through Multidimensional Tagging

التكيف متعدد المجالات في الترجمة الآلية العصبية من خلال علامات متعددة الأبعاد

731   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Production NMT systems typically need to serve niche domains that are not covered by adequately large and readily available parallel corpora. As a result, practitioners often fine-tune general purpose models to each of the domains their organisation caters to. The number of domains however can often become large, which in combination with the number of languages that need serving can lead to an unscalable fleet of models to be developed and maintained. We propose Multi Dimensional Tagging, a method for fine-tuning a single NMT model on several domains simultaneously, thus drastically reducing development and maintenance costs. We run experiments where a single MDT model compares favourably to a set of SOTA specialist models, even when evaluated on the domain those baselines have been fine-tuned on. Besides BLEU, we report human evaluation results. MDT models are now live at Booking.com, powering an MT engine that serves millions of translations a day in over 40 different languages.



References used
https://aclanthology.org/
rate research

Read More

We study the problem of domain adaptation in Neural Machine Translation (NMT) when domain-specific data cannot be shared due to confidentiality or copyright issues. As a first step, we propose to fragment data into phrase pairs and use a random sampl e to fine-tune a generic NMT model instead of the full sentences. Despite the loss of long segments for the sake of confidentiality protection, we find that NMT quality can considerably benefit from this adaptation, and that further gains can be obtained with a simple tagging technique.
Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both general domain and in-domain data. However, the existing methods for domain adaptation usually suffer from catast rophic forgetting, large domain divergence, and model explosion. To address these three problems, we propose a method of divide and conquer'' which is based on the importance of neurons or parameters for the translation model. In this method, we first prune the model and only keep the important neurons or parameters, making them responsible for both general-domain and in-domain translation. Then we further train the pruned model supervised by the original whole model with knowledge distillation. Last we expand the model to the original size and fine-tune the added parameters for the in-domain translation. We conducted experiments on different language pairs and domains and the results show that our method can achieve significant improvements compared with several strong baselines.
Neural machine translation based on bilingual text with limited training data suffers from lexical diversity, which lowers the rare word translation accuracy and reduces the generalizability of the translation system. In this work, we utilise the mul tiple captions from the Multi-30K dataset to increase the lexical diversity aided with the cross-lingual transfer of information among the languages in a multilingual setup. In this multilingual and multimodal setting, the inclusion of the visual features boosts the translation quality by a significant margin. Empirical study affirms that our proposed multimodal approach achieves substantial gain in terms of the automatic score and shows robustness in handling the rare word translation in the pretext of English to/from Hindi and Telugu translation tasks.
Machine translation systems are vulnerable to domain mismatch, especially in a low-resource scenario. Out-of-domain translations are often of poor quality and prone to hallucinations, due to exposure bias and the decoder acting as a language model. W e adopt two approaches to alleviate this problem: lexical shortlisting restricted by IBM statistical alignments, and hypothesis reranking based on similarity. The methods are computationally cheap and show success on low-resource out-of-domain test sets. However, the methods lose advantage when there is sufficient data or too great domain mismatch. This is due to both the IBM model losing its advantage over the implicitly learned neural alignment, and issues with subword segmentation of unseen words.
In this paper, we describe our MiSS system that participated in the WMT21 news translation task. We mainly participated in the evaluation of the three translation directions of English-Chinese and Japanese-English translation tasks. In the systems su bmitted, we primarily considered wider networks, deeper networks, relative positional encoding, and dynamic convolutional networks in terms of model structure, while in terms of training, we investigated contrastive learning-reinforced domain adaptation, self-supervised training, and optimization objective switching training methods. According to the final evaluation results, a deeper, wider, and stronger network can improve translation performance in general, yet our data domain adaption method can improve performance even more. In addition, we found that switching to the use of our proposed objective during the finetune phase using relatively small domain-related data can effectively improve the stability of the model's convergence and achieve better optimal performance.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا