New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Dynamic Head Importance Computation Mechanism for Neural Machine Translation

آلية حسابية ذات أهمية ديناميكية للترجمة الآلية العصبية

309 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

آلة القراءة العصبية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

تيسير آليات الاهتمام الموازية المتعددة التي تستخدم رؤساء اهتمامات متعددة أكبر أداء نموذج المحول لمختلف التطبيقات على سبيل المثال، الترجمة الآلية العصبية (NMT)، تصنيف النص. في آلية اهتمام متعددة الرأس، يحضر رؤوس مختلفة إلى أجزاء مختلفة من المدخلات. ومع ذلك، فإن القيد هو أن رؤساء متعددة قد يحضرون إلى نفس الجزء من الإدخال، مما أدى إلى زيادة الرؤوس المتعددة. وبالتالي، يتم استخدام الموارد النموذجية. نهج واحد لتجنب ذلك هو أن تقليم رؤساء أقل أهمية بناء على درجة أهمية معينة. في هذا العمل، نركز على تصميم آلية حسابية ذات أهمية ديناميكية (DHICM) لحساب أهمية الرأس بشكل حيوي فيما يتعلق بالإدخال. إن رؤيتنا هي تصميم طبقة اهتمام إضافي مع الاهتمام متعدد الأطراف، وتستخدم مخرجات الاهتمام متعدد الأطراف جنبا إلى جنب مع المدخلات، لحساب أهمية كل رأس. بالإضافة إلى ذلك، نضيف دالة خسارة إضافية لمنع النموذج من تعيين النتيجة نفسها لجميع الرؤوس، لتحديد رؤوس أكثر أهمية وأداء الارتجال. لقد قمنا بتحليل أداء DHICM ل NMT مع لغات مختلفة. تظهر التجارب في مجموعات البيانات المختلفة أن DHICM تتفوق على النهج التقليدي القائم على المحولات من خلال الهامش الكبير، خاصة، عند توفر بيانات التدريب الأقل.

Multiple parallel attention mechanisms that use multiple attention heads facilitate greater performance of the Transformer model for various applications e.g., Neural Machine Translation (NMT), text classification. In multi-head attention mechanism, different heads attend to different parts of the input. However, the limitation is that multiple heads might attend to the same part of the input, resulting in multiple heads being redundant. Thus, the model resources are under-utilized. One approach to avoid this is to prune least important heads based on certain importance score. In this work, we focus on designing a Dynamic Head Importance Computation Mechanism (DHICM) to dynamically calculate the importance of a head with respect to the input. Our insight is to design an additional attention layer together with multi-head attention, and utilize the outputs of the multi-head attention along with the input, to compute the importance for each head. Additionally, we add an extra loss function to prevent the model from assigning same score to all heads, to identify more important heads and improvise performance. We analyzed performance of DHICM for NMT with different languages. Experiments on different datasets show that DHICM outperforms traditional Transformer-based approach by large margin, especially, when less training data is available.

References used

https://aclanthology.org/

rate research

Continual Learning for Neural Machine Translation

399 - Association for Computation Linguistics 2021 مقالة

Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus. However, this bears the risk of catastrophic forgetting that the performance on the general domain is decreased drastically. In this work, we propose a new continual learning framework for NMT models. We consider a scenario where the training is comprised of multiple stages and propose a dynamic knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. We also find that the bias exists in the output linear projection when fine-tuning on the in-domain corpus, and propose a bias-correction module to eliminate the bias. We conduct experiments on three representative settings of NMT application. Experimental results show that the proposed method achieves superior performance compared to baseline models in all settings.

آلة ذات مستوى المستند صناعة حمض الفوسفور

Recurrent Attention for Neural Machine Translation

358 - Association for Computation Linguistics 2021 مقالة

Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.

تخصيص كبير recurrent attention الانتباه المتكرر صناعة حمض الفوسفور

Smart-Start Decoding for Neural Machine Translation

385 - Association for Computation Linguistics 2021 مقالة

Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left. In this work, we propose a novel method that breaks up the limitation of these decoding orders, called Smart-Start decoding. Mor e specifically, our method first predicts a median word. It starts to decode the words on the right side of the median word and then generates words on the left. We evaluate the proposed Smart-Start decoding method on three datasets. Experimental results show that the proposed method can significantly outperform strong baseline models.

آلة ذات مستوى المستند صناعة حمض الفوسفور

Counterfactual Data Augmentation for Neural Machine Translation

319 - Association for Computation Linguistics 2021 مقالة

We propose a data augmentation method for neural machine translation. It works by interpreting language models and phrasal alignment causally. Specifically, it creates augmented parallel translation corpora by generating (path-specific) counterfactua l aligned phrases. We generate these by sampling new source phrases from a masked language model, then sampling an aligned counterfactual target phrase by noting that a translation language model can be interpreted as a Gumbel-Max Structural Causal Model (Oberst and Sontag, 2019). Compared to previous work, our method takes both context and alignment into account to maintain the symmetry between source and target sequences. Experiments on IWSLT'15 English → Vietnamese, WMT'17 English → German, WMT'18 English → Turkish, and WMT'19 robust English → French show that the method can improve the performance of translation, backtranslation and translation robustness.

القدرة على الاحترام صناعة حمض الفوسفور

Phrase-level Active Learning for Neural Machine Translation

621 - Association for Computation Linguistics 2021 مقالة

Neural machine translation (NMT) is sensitive to domain shift. In this paper, we address this problem in an active learning setting where we can spend a given budget on translating in-domain data, and gradually fine-tune a pre-trained out-of-domain N MT model on the newly translated data. Existing active learning methods for NMT usually select sentences based on uncertainty scores, but these methods require costly translation of full sentences even when only one or two key phrases within the sentence are informative. To address this limitation, we re-examine previous work from the phrase-based machine translation (PBMT) era that selected not full sentences, but rather individual phrases. However, while incorporating these phrases into PBMT systems was relatively simple, it is less trivial for NMT systems, which need to be trained on full sequences to capture larger structural properties of sentences unique to the new domain. To overcome these hurdles, we propose to select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators. In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods, improving up to 1.2 BLEU score over strong active learning baselines.

تحسين بقوة تعليم فعال صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Dynamic Head Importance Computation Mechanism for Neural Machine Translation

آلية حسابية ذات أهمية ديناميكية للترجمة الآلية العصبية

Ask ChatGPT about the research

Read More

suggested questions