Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Recurrent Attention for the Transformer

الانتباه المتكرر للمحول

603 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoder-decoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.

References used

https://aclanthology.org/

rate research

Recurrent Attention for Neural Machine Translation

605 - Association for Computation Linguistics 2021 مقالة

Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.

تخصيص كبير recurrent attention الانتباه المتكرر صناعة حمض الفوسفور

On Biasing Transformer Attention Towards Monotonicity

505 - Association for Computation Linguistics 2021 مقالة

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

biasing transformer attention biasing transformer biasing التحيز محول الانتباه التحيز محول انحازة صناعة حمض الفوسفور المزيد..

Recurrent Bacterial Meningitis & Basilar Skull Fractures

1552 - Damascus University 2011 ورقة بحثية

To identefi the Bacterials agents in Meningitis resulting from basal skull Fractures, The antibiotics sensitivity, and The benefit of pneumo 23 vaccine to prevent the occurrence of meningitis in these cases.

sensitivity التهاب السحايا الجرثومي كسور قاعدة الجمجمة لقاح الصادات المكورات الرئوية الحساسية Bacterial Meningitis Basal skull Fractures Vaccine Antibiotics المزيد..

Attention-based Contrastive Learning for Winograd Schemas

652 - Association for Computation Linguistics 2021 مقالة

Self-supervised learning has recently attracted considerable attention in the NLP community for its ability to learn discriminative features using a contrastive objective. This paper investigates whether contrastive learning can be extended to Transf omer attention to tackling the Winograd Schema Challenge. To this end, we propose a novel self-supervised framework, leveraging a contrastive loss directly at the level of self-attention. Experimental analysis of our attention-based models on multiple datasets demonstrates superior commonsense reasoning capabilities. The proposed approach outperforms all comparable unsupervised approaches while occasionally surpassing supervised ones.

winograd schema challenge winograd schemas schema challenge وينوغراد مخطط التحدي مخططات Winograd مخطط التحدي صناعة حمض الفوسفور المزيد..

Learning Hard Retrieval Decoder Attention for Transformers

459 - Association for Computation Linguistics 2021 مقالة

The Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily. The multi-head attention network performs the scaled dot-product attention function in parallel, empowering the model by jointly atten ding to information from different representation subspaces at different positions. In this paper, we present an approach to learning a hard retrieval attention where an attention head only attends to one token in the sentence rather than all tokens. The matrix multiplication between attention probabilities and the value sequence in the standard scaled dot-product attention can thus be replaced by a simple and efficient retrieval operation. We show that our hard retrieval attention mechanism is 1.43 times faster in decoding, while preserving translation quality on a wide range of machine translation tasks when used in the decoder self- and cross-attention networks.

hard retrieval attention transformer translation model انتباه الاسترجاع الصعب ترجمة المحول نموذج صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Recurrent Attention for the Transformer

الانتباه المتكرر للمحول

Ask ChatGPT about the research

Read More

suggested questions