في هذه الورقة، نحقق في عوامل القيادة وراء التسلسل، وهي طريقة بسيطة ولكنها فعالة من البيانات للترجمة الآلية العصبية منخفضة الموارد.تشير تجاربنا إلى أن سياق الخطاب غير مرجح هو سبب تحسين تسلسل بلو من قبل حوالي +1 عبر أربع أزواج لغوية.بدلا من ذلك، نوضح أن التحسن يأتي من ثلاثة عوامل أخرى لا علاقة لها بالحبال: تنوع السياق، وتنوع الطول، و (إلى حد أقل) يتحول الموقف.
In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for concatenation improving BLEU by about +1 across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.
References used
https://aclanthology.org/
Recently, neural machine translation is widely used for its high translation accuracy, but it is also known to show poor performance at long sentence translation. Besides, this tendency appears prominently for low resource languages. We assume that t
We propose a data augmentation method for neural machine translation. It works by interpreting language models and phrasal alignment causally. Specifically, it creates augmented parallel translation corpora by generating (path-specific) counterfactua
Data augmentation, which refers to manipulating the inputs (e.g., adding random noise,masking specific parts) to enlarge the dataset,has been widely adopted in machine learning. Most data augmentation techniques operate on a single input, which limit
For Japanese-to-English translation, zero pronouns in Japanese pose a challenge, since the model needs to infer and produce the corresponding pronoun in the target side of the English sentence. However, although fully resolving zero pronouns often ne
In this paper we explore a very simple neural approach to mapping orthography to phonetic transcription in a low-resource context. The basic idea is to start from a baseline system and focus all efforts on data augmentation. We will see that some techniques work, but others do not.