في هذا العمل، نقوم بإجراء تحقيق شامل على إحدى المركزيات من أنظمة الترجمة الآلية الحديثة: آلية اهتمام مفوض الترم التشفير.بدافع من مفهوم محاذاة الدرجة الأولى، فإننا نقدم آلية الاهتمام (الصليب) من خلال اتصال متكرر، مما يسمح بالوصول المباشر إلى قرارات الانتباه / المحاذاة السابقة.نقترح عدة طرق لتضمين مثل هذا التكرار في آلية الاهتمام.التحقق من أدائها عبر مهام ترجمة مختلفة نستنتج أن هذه الملحقات والتبعية ليست مفيدة لأداء الترجمة من بنية المحولات.
In this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoder-decoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.
References used
https://aclanthology.org/
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention
To identefi the Bacterials agents in Meningitis resulting from basal skull
Fractures, The antibiotics sensitivity, and The benefit of pneumo 23 vaccine to prevent the occurrence of
meningitis in these cases.
Self-supervised learning has recently attracted considerable attention in the NLP community for its ability to learn discriminative features using a contrastive objective. This paper investigates whether contrastive learning can be extended to Transf
The Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily. The multi-head attention network performs the scaled dot-product attention function in parallel, empowering the model by jointly atten