يعد الانتباه عبر الانتباه عنصرا هاما للترجمة الآلية العصبية (NMT)، والتي تتحقق دائما عن طريق انتباه DOT-Product في الأساليب السابقة.ومع ذلك، فإن اهتمام DOT-Product يعتبر فقط الارتباط بين الكلمات بين الكلمات، مما أدى إلى تشتت عند التعامل مع جمل طويلة وإهمال العلاقات المجاورة للمصدر.مستوحاة من اللغويات، فإن القضايا المذكورة أعلاه ناجمة عن تجاهل نوع من الاهتمام، الذي يطلق عليه الانتباه المركزي، الذي يركز على عدة كلمات مركزية ثم ينتشر حولها.في هذا العمل، نطبق نموذج خليط غاوسي (GMM) لنموذج الاهتمام المركزي بالاهتمام الشامل.تبين التجارب والتحليلات التي أجريناها على ثلاث مجموعات من مجموعات البيانات أن الطريقة المقترحة تتفوق على خط الأساس ولديها تحسن كبير في جودة المحاذاة ودقة N-Gram والترجمة الحكم الطويلة.
Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.
References used
https://aclanthology.org/
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute
We present a simple method for extending transformers to source-side trees. We define a number of masks that limit self-attention based on relationships among tree nodes, and we allow each attention head to learn which mask or masks to use. On transl
Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus.
Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left. In this work, we propose a novel method that breaks up the limitation of these decoding orders, called Smart-Start decoding. Mor
We propose a data augmentation method for neural machine translation. It works by interpreting language models and phrasal alignment causally. Specifically, it creates augmented parallel translation corpora by generating (path-specific) counterfactua