Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

تكبير البيانات عن طريق تسلسل للترجمة المنخفضة الموارد: لغز وحل

488 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for concatenation improving BLEU by about +1 across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

References used

https://aclanthology.org/

rate research

Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

503 - Association for Computation Linguistics 2021 مقالة

Recently, neural machine translation is widely used for its high translation accuracy, but it is also known to show poor performance at long sentence translation. Besides, this tendency appears prominently for low resource languages. We assume that t hese problems are caused by long sentences being few in the train data. Therefore, we propose a data augmentation method for handling long sentences. Our method is simple; we only use given parallel corpora as train data and generate long sentences by concatenating two sentences. Based on our experiments, we confirm improvements in long sentence translation by proposed data augmentation despite the simplicity. Moreover, the proposed method improves translation quality more when combined with back-translation.

sentence concatenation approach concatenation approach نهج تسلسل الجملة نهج التسلسل صناعة حمض الفوسفور

Counterfactual Data Augmentation for Neural Machine Translation

542 - Association for Computation Linguistics 2021 مقالة

We propose a data augmentation method for neural machine translation. It works by interpreting language models and phrasal alignment causally. Specifically, it creates augmented parallel translation corpora by generating (path-specific) counterfactua l aligned phrases. We generate these by sampling new source phrases from a masked language model, then sampling an aligned counterfactual target phrase by noting that a translation language model can be interpreted as a Gumbel-Max Structural Causal Model (Oberst and Sontag, 2019). Compared to previous work, our method takes both context and alignment into account to maintain the symmetry between source and target sequences. Experiments on IWSLT'15 English → Vietnamese, WMT'17 English → German, WMT'18 English → Turkish, and WMT'19 robust English → French show that the method can improve the performance of translation, backtranslation and translation robustness.

القدرة على الاحترام صناعة حمض الفوسفور

mixSeq: A Simple Data Augmentation Methodfor Neural Machine Translation

637 - Association for Computation Linguistics 2021 مقالة

Data augmentation, which refers to manipulating the inputs (e.g., adding random noise,masking specific parts) to enlarge the dataset,has been widely adopted in machine learning. Most data augmentation techniques operate on a single input, which limit s the diversity of the training corpus. In this paper, we propose a simple yet effective data augmentation technique for neural machine translation, mixSeq, which operates on multiple inputs and their corresponding targets. Specifically, we randomly select two input sequences,concatenate them together as a longer input aswell as their corresponding target sequencesas an enlarged target, and train models on theaugmented dataset. Experiments on nine machine translation tasks demonstrate that such asimple method boosts the baselines by a non-trivial margin. Our method can be further combined with single input based data augmentation methods to obtain further improvements.

augmentation methodfor neural data augmentation methodfor methodfor neural machine طريقة تكبير للجدل طريقة تكبير البيانات ل طريقة للآلة العصبية صناعة حمض الفوسفور المزيد..

Zero-pronoun Data Augmentation for Japanese-to-English Translation

619 - Association for Computation Linguistics 2021 مقالة

For Japanese-to-English translation, zero pronouns in Japanese pose a challenge, since the model needs to infer and produce the corresponding pronoun in the target side of the English sentence. However, although fully resolving zero pronouns often ne eds discourse context, in some cases, the local context within a sentence gives clues to the inference of the zero pronoun. In this study, we propose a data augmentation method that provides additional training signals for the translation model to learn correlations between local context and zero pronouns. We show that the proposed method significantly improves the accuracy of zero pronoun translation with machine translation experiments in the conversational domain.

zero-pronoun data augmentation data augmentation zero-pronoun data تكبير البيانات صفر ضمير تكبير البيانات بيانات صفرية الضمير صناعة حمض الفوسفور المزيد..

Data augmentation for low-resource grapheme-to-phoneme mapping

825 - Association for Computation Linguistics 2021 مقالة

In this paper we explore a very simple neural approach to mapping orthography to phonetic transcription in a low-resource context. The basic idea is to start from a baseline system and focus all efforts on data augmentation. We will see that some techniques work, but others do not.

تعديل المسافة المرجحة low-resource context low-resource سياق منخفض الموارد الموارد المنخفضة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

تكبير البيانات عن طريق تسلسل للترجمة المنخفضة الموارد: لغز وحل

Ask ChatGPT about the research

Read More

suggested questions