نقدم طريقة بسيطة لتوسيع المحولات إلى الأشجار من جانب المصدر.نحن نحدد عددا من الأقنعة التي تحد من اهتمام الذات بناء على العلاقات بين العقد الشجرة، ونحن نسمح لكل انتباه في أن يتعلم أي قناع أو أقنعة لاستخدامها.عند الترجمة من الإنجليزية إلى العديد من لغات الموارد المنخفضة، والترجمة في كلا الاتجاهين بين اللغة الإنجليزية والألمانية، تعمل طريقتنا دائما على التحليل البسيط لمجموعة تحليل جانب المصدر ويحسن دائما تقريبا على خط أساس تسلسل إلى تسلسل، حسب ما يصلإلى +2.1 بلو.
We present a simple method for extending transformers to source-side trees. We define a number of masks that limit self-attention based on relationships among tree nodes, and we allow each attention head to learn which mask or masks to use. On translation from English to various low-resource languages, and translation in both directions between English and German, our method always improves over simple linearization of the source-side parse tree and almost always improves over a sequence-to-sequence baseline, by up to +2.1 BLEU.
References used
https://aclanthology.org/
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute
Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in
Scheduled sampling is widely used to mitigate the exposure bias problem for neural machine translation. Its core motivation is to simulate the inference scene during training by replacing ground-truth tokens with predicted tokens, thus bridging the g
Recently, neural machine translation is widely used for its high translation accuracy, but it is also known to show poor performance at long sentence translation. Besides, this tendency appears prominently for low resource languages. We assume that t
Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus.