Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

published by Association for Computation Linguistics in 2021 in Artificial Intelligence and research's language is English Download

Abstract in English

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for concatenation improving BLEU by about +1 across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

References used

https://aclanthology.org/

Download