Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Syntax-Based Attention Masking for Neural Machine Translation

اخفاء الاهتمام القائم على بناء الجملة للترجمة الآلية العصبية

725 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

النصوص الفلسفية syntax-based attention masking masking for neural اخفاء الاهتمام بناء على بناء الجملة اخفاء للآلام العصبية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a simple method for extending transformers to source-side trees. We define a number of masks that limit self-attention based on relationships among tree nodes, and we allow each attention head to learn which mask or masks to use. On translation from English to various low-resource languages, and translation in both directions between English and German, our method always improves over simple linearization of the source-side parse tree and almost always improves over a sequence-to-sequence baseline, by up to +2.1 BLEU.

References used

https://aclanthology.org/

rate research

Recurrent Attention for Neural Machine Translation

669 - Association for Computation Linguistics 2021 مقالة

Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.

تخصيص كبير recurrent attention الانتباه المتكرر صناعة حمض الفوسفور

Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model

728 - Association for Computation Linguistics 2021 مقالة

Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.

الرسوم البيانية المعارف التقليدية خليط غاوسي نموذج صناعة حمض الفوسفور

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation

593 - Association for Computation Linguistics 2021 مقالة

Scheduled sampling is widely used to mitigate the exposure bias problem for neural machine translation. Its core motivation is to simulate the inference scene during training by replacing ground-truth tokens with predicted tokens, thus bridging the g ap between training and inference. However, vanilla scheduled sampling is merely based on training steps and equally treats all decoding steps. Namely, it simulates an inference scene with uniform error rates, which disobeys the real inference scene, where larger decoding steps usually have higher error rates due to error accumulations. To alleviate the above discrepancy, we propose scheduled sampling methods based on decoding steps, increasing the selection chance of predicted tokens with the growth of decoding steps. Consequently, we can more realistically simulate the inference scene during training, thus better bridging the gap between training and inference. Moreover, we investigate scheduled sampling based on both training steps and decoding steps for further improvements. Experimentally, our approaches significantly outperform the Transformer baseline and vanilla scheduled sampling on three large-scale WMT tasks. Additionally, our approaches also generalize well to the text summarization task on two popular benchmarks.

متعدد رئيس الانتباه مؤخرا صناعة حمض الفوسفور

Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

658 - Association for Computation Linguistics 2021 مقالة

Recently, neural machine translation is widely used for its high translation accuracy, but it is also known to show poor performance at long sentence translation. Besides, this tendency appears prominently for low resource languages. We assume that t hese problems are caused by long sentences being few in the train data. Therefore, we propose a data augmentation method for handling long sentences. Our method is simple; we only use given parallel corpora as train data and generate long sentences by concatenating two sentences. Based on our experiments, we confirm improvements in long sentence translation by proposed data augmentation despite the simplicity. Moreover, the proposed method improves translation quality more when combined with back-translation.

sentence concatenation approach concatenation approach نهج تسلسل الجملة نهج التسلسل صناعة حمض الفوسفور

Continual Learning for Neural Machine Translation

742 - Association for Computation Linguistics 2021 مقالة

Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus. However, this bears the risk of catastrophic forgetting that the performance on the general domain is decreased drastically. In this work, we propose a new continual learning framework for NMT models. We consider a scenario where the training is comprised of multiple stages and propose a dynamic knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. We also find that the bias exists in the output linear projection when fine-tuning on the in-domain corpus, and propose a bias-correction module to eliminate the bias. We conduct experiments on three representative settings of NMT application. Experimental results show that the proposed method achieves superior performance compared to baseline models in all settings.

آلة ذات مستوى المستند صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Syntax-Based Attention Masking for Neural Machine Translation

اخفاء الاهتمام القائم على بناء الجملة للترجمة الآلية العصبية

Ask ChatGPT about the research

Read More

suggested questions