يقترح هذا العمل تحليلا مكثفا للهندسة المعمارية المحول في إعداد الترجمة الآلية العصبية (NMT).مع التركيز على آلية اهتمام التشفير في فك التشفير، نثبت أن أوزان الاهتمام بانتظام أخطاء المحاذاة من خلال الاعتماد بشكل أساسي على الرموز غير المصنفة من تسلسل المصدر.ومع ذلك، نلاحظ أن نماذج NMT تخصص الاهتمام بهؤلاء الرموز لتنظيم المساهمة في التنبؤ بالسياقتين المصدرين وبادئة التسلسل المستهدف.نحن نقدم دليلا على تأثير محاذاة خاطئة على السلوك النموذجي، مما يدل على أن آلية اهتمام فك تشفير التشفير مفاجأة بشكل جيد كطريقة الترجمة الترجمة الترجمة الشخصية ل NMT.أخيرا، استنادا إلى تحليلنا، نقترح طرق تقلل إلى حد كبير معدل خطأ محاذاة الكلمة مقارنة بالمحاذاة المستحثة القياسية من أوزان الاهتمام.
This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that largely reduce the word alignment error rate compared to standard induced alignments from attention weights.
References used
https://aclanthology.org/
We offer an approach to explain Decision Tree (DT) predictions by addressing potential conflicts between aspects of these predictions and plausible expectations licensed by background information. We define four types of conflicts, operationalize the
Explaining neural network models is important for increasing their trustworthiness in real-world applications. Most existing methods generate post-hoc explanations for neural network models by identifying individual feature attributions or detecting
Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention
Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentenc
Earning calls are among important resources for investors and analysts for updating their price targets. Firms usually publish corresponding transcripts soon after earnings events. However, raw transcripts are often too long and miss the coherent str