Using Confidential Data for Domain Adaptation of Neural Machine Translation

published by Association for Computation Linguistics in 2021 in Artificial Intelligence and research's language is English Download

Abstract in English

We study the problem of domain adaptation in Neural Machine Translation (NMT) when domain-specific data cannot be shared due to confidentiality or copyright issues. As a first step, we propose to fragment data into phrase pairs and use a random sample to fine-tune a generic NMT model instead of the full sentences. Despite the loss of long segments for the sake of confidentiality protection, we find that NMT quality can considerably benefit from this adaptation, and that further gains can be obtained with a simple tagging technique.

References used

https://aclanthology.org/

Download