Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Mask Attention Networks: Rethinking and Strengthen Transformer

قناع انتباه الشبكات: إعادة التفكير في المحولات وتعزيزها

405 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

rethinking and strengthen mask attention networks strengthen transformer إعادة التفكير وتعزيزها قناع انتباه الشبكات تعزيز المحولات صناعة حمض الفوسفور

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.

References used

https://aclanthology.org/

rate research

Dynabench: Rethinking Benchmarking in NLP

707 - Association for Computation Linguistics 2021 مقالة

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model wil l misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.

rethinking benchmarking dataset creation dynabench إعادة التفكير في المعيار إنشاء DataSet. Dynabench. صناعة حمض الفوسفور المزيد..

Rethinking Why Intermediate-Task Fine-Tuning Works

379 - Association for Computation Linguistics 2021 مقالة

Supplementary Training on Intermediate Labeled-data Tasks (STILT) is a widely applied technique, which first fine-tunes the pretrained language models on an intermediate task before on the target task of interest. While STILT is able to further impro ve the performance of pretrained language models, it is still unclear why and when it works. Previous research shows that those intermediate tasks involving complex inference, such as commonsense reasoning, work especially well for RoBERTa-large. In this paper, we discover that the improvement from an intermediate task could be orthogonal to it containing reasoning or other complex skills --- a simple real-fake discrimination task synthesized by GPT2 can benefit diverse target tasks. We conduct extensive experiments to study the impact of different factors on STILT. These findings suggest rethinking the role of intermediate fine-tuning in the STILT pipeline.

ترجمة غير مسندية intermediate labeled-data tasks intermediate-task fine-tuning works مهام البيانات الوسيطة المسامحة الأعمال الوسيطة تعمل بالضبط صناعة حمض الفوسفور

Rethinking Perturbations in Encoder-Decoders for Fast Training

366 - Association for Computation Linguistics 2021 مقالة

We often use perturbations to regularize neural models. For neural encoder-decoders, previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial perturbations (Sato et al., 2019) as perturbations but these methods require co nsiderable computational time. Thus, this study addresses the question of whether these approaches are efficient enough for training time. We compare several perturbations in sequence-to-sequence problems with respect to computational time. Experimental results show that the simple techniques such as word dropout (Gal and Ghahramani, 2016) and random replacement of input tokens achieve comparable (or better) scores to the recently proposed perturbations, even though these simple methods are faster.

fast training encoder-decoders for fast perturbations التدريب السريع تشفير الرقص لسريع الاضطرابات صناعة حمض الفوسفور المزيد..

Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

316 - Association for Computation Linguistics 2021 مقالة

Transformer-based pre-trained language models have significantly improved the performance of various natural language processing (NLP) tasks in the recent years. While effective and prevalent, these models are usually prohibitively large for resource -limited deployment scenarios. A thread of research has thus been working on applying network pruning techniques under the pretrain-then-finetune paradigm widely adopted in NLP. However, the existing pruning results on benchmark transformers, such as BERT, are not as remarkable as the pruning results in the literature of convolutional neural networks (CNNs). In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing number of channels and layers, while existing works on sparse pruning of BERT yields inferior results than its small-dense counterparts such as TinyBERT. In this work, we aim to fill this gap by studying how knowledge are transferred and lost during the pre-train, fine-tune, and pruning process, and proposing a knowledge-aware sparse pruning process that achieves significantly superior results than existing literature. We show for the first time that sparse pruning compresses a BERT model significantly more than reducing its number of channels and layers. Experiments on multiple data sets of GLUE benchmark show that our method outperforms the leading competitors with a 20-times weight/FLOPs compression and neglectable loss in prediction accuracy.

rethinking network pruning sparse pruning إعادة التفكير في شبكة التقليم تشذيب تشذيب متفرق صناعة حمض الفوسفور

Perceived and Intended Sarcasm Detection with Graph Attention Networks

341 - Association for Computation Linguistics 2021 مقالة

Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and inte rpretation. In this work, we propose a framework jointly leveraging (1) a user context from their historical tweets together with (2) the social information from a user's conversational neighborhood in an interaction graph, to contextualize the interpretation of the post. We use graph attention networks (GAT) over users and tweets in a conversation thread, combined with dense user history representations. Apart from achieving state-of-the-art results on the recently published dataset of 19k Twitter users with 30K labeled tweets, adding 10M unlabeled tweets as context, our results indicate that the model contributes to interpreting the sarcastic intentions of an author more than to predicting the sarcasm perception by others.

intended sarcasm detection perceived and intended graph attention networks الكشف عن السخرية المقصود ينظر إليها وتصدرت شبكات الاهتمام الرسم البياني صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Mask Attention Networks: Rethinking and Strengthen Transformer

قناع انتباه الشبكات: إعادة التفكير في المحولات وتعزيزها

Ask ChatGPT about the research

Read More

suggested questions