Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

On Attention Redundancy: A Comprehensive Study

على الاهتمام التكرار: دراسة شاملة

369 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

attention redundancy الاهتمام التكرار وفرة انتباه صناعة حمض الفوسفور

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

يتم تطبيق آلية الاهتمام متعددة الأطباق متعددة الأطباق على نطاق واسع في نماذج اللغة العصبية الحديثة. وقد لوحظ انتباه التكرار بين رؤوس الاهتمام لكن لم يتم دراسته بعمق في الأدب. باستخدام نموذج BERT-BASE كمثال، توفر هذه الورقة دراسة شاملة حول التكرار الاهتمام المفيدة لتفسير النموذج والضغط النموذجي. نحن نحلل التكرار الاهتمام مع خمسة WS وكيف. (ماذا) نحدد وتركيز الدراسة على مصفوفات التكرار الناتجة عن نموذج Bert-Base Base المدرب مسبقا ومضبوطة من أجل مجموعات بيانات الغراء. (كيف نستخدم كل من وظائف المسافات المستندة إلى كل من الوظائف المستندة إلى العملة على الإطلاق لقياس التكرار. (حيث) لوحظ أنماط التكرار واضحة ومماثلة (بنية نظام المجموعة) بين رؤساء الاهتمام. (متى) أنماط التكرار متشابهة في كل من مراحل التدريب المسبق والضبط بشكل جيد. (من) نكتشف أن أنماط التكرار هي المهام الملحد. أنماط التكرار مماثلة موجودة حتى للتسلسلات الرمزية التي تم إنشاؤها عشوائيا. (لماذا ") نحن أيضا تقييم التأثيرات في نسب التسرب قبل التدريب على التكرار الاهتمام. استنادا إلى أنماط تكرار الاهتمام المستقل بالمرحلة المستقلة ومهمة التكرار، نقترح طريقة تشذيب صفرية غير مريحة كدراسة حالة. تجارب حول مهام الغراء التي تعمل بالضبط تحقق من فعاليتها. تحليلات شاملة حول التكرار الاهتمام جعل الفهم النموذجي ونموذج صفر لقطة تشذيب الواعدة.

Multi-layer multi-head self-attention mechanism is widely applied in modern neural language models. Attention redundancy has been observed among attention heads but has not been deeply studied in the literature. Using BERT-base model as an example, this paper provides a comprehensive study on attention redundancy which is helpful for model interpretation and model compression. We analyze the attention redundancy with Five-Ws and How. (What) We define and focus the study on redundancy matrices generated from pre-trained and fine-tuned BERT-base model for GLUE datasets. (How) We use both token-based and sentence-based distance functions to measure the redundancy. (Where) Clear and similar redundancy patterns (cluster structure) are observed among attention heads. (When) Redundancy patterns are similar in both pre-training and fine-tuning phases. (Who) We discover that redundancy patterns are task-agnostic. Similar redundancy patterns even exist for randomly generated token sequences. (Why'') We also evaluate influences of the pre-training dropout ratios on attention redundancy. Based on the phase-independent and task-agnostic attention redundancy patterns, we propose a simple zero-shot pruning method as a case study. Experiments on fine-tuning GLUE tasks verify its effectiveness. The comprehensive analyses on attention redundancy make model understanding and zero-shot model pruning promising.

References used

https://aclanthology.org/

rate research

Contemporary NLP Modeling in Six Comprehensive Programming Assignments

363 - Association for Computation Linguistics 2021 مقالة

We present a series of programming assignments, adaptable to a range of experience levels from advanced undergraduate to PhD, to teach students design and implementation of modern NLP systems. These assignments build from the ground up and emphasize full-stack understanding of machine learning models: initially, students implement inference and gradient computation by hand, then use PyTorch to build nearly state-of-the-art neural networks using current best practices. Topics are chosen to cover a wide range of modeling and inference techniques that one might encounter, ranging from linear models suitable for industry applications to state-of-the-art deep learning models used in NLP research. The assignments are customizable, with constrained options to guide less experienced students or open-ended options giving advanced students freedom to explore. All of them can be deployed in a fully autogradable fashion, and have collectively been tested on over 300 students across several semesters.

comprehensive programming assignments contemporary nlp modeling comprehensive programming تعيينات البرمجة الشاملة النمذجة NLP المعاصرة البرمجة الشاملة صناعة حمض الفوسفور المزيد..

On the Difficulty of Segmenting Words with Attention

341 - Association for Computation Linguistics 2021 مقالة

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task. Models trained to predict words from either phones or speech (i.e., the opposite direction needed to generalize to new data), yield much worse results, suggesting that attention-based segmentation is only useful in limited scenarios.

difficulty of segmenting segmenting words difficulty صعوبة تجزئة تجزئة الكلمات صعوبة صناعة حمض الفوسفور المزيد..

Neural Attention-Aware Hierarchical Topic Model

372 - Association for Computation Linguistics 2021 مقالة

Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentenc e-level information is ignored, and (2) external semantic knowledge regarding documents, sentences and words are not exploited for the training. To address these issues, we propose a variational autoencoder (VAE) NTM model that jointly reconstructs the sentence and document word counts using combinations of bag-of-words (BoW) topical embeddings and pre-trained semantic embeddings. The pre-trained embeddings are first transformed into a common latent topical space to align their semantics with the BoW embeddings. Our model also features hierarchical KL divergence to leverage embeddings of each document to regularize those of their sentences, paying more attention to semantically relevant sentences. Both quantitative and qualitative experiments have shown the efficacy of our model in 1) lowering the reconstruction errors at both the sentence and document levels, and 2) discovering more coherent topics from real-world datasets.

attention-aware hierarchical topic neural attention-aware hierarchical الانتباه تدرك موضوع هرمي الاهتمام العصبي يدرك التسلسل الهرمي صناعة حمض الفوسفور

Syntax-Based Attention Masking for Neural Machine Translation

462 - Association for Computation Linguistics 2021 مقالة

We present a simple method for extending transformers to source-side trees. We define a number of masks that limit self-attention based on relationships among tree nodes, and we allow each attention head to learn which mask or masks to use. On transl ation from English to various low-resource languages, and translation in both directions between English and German, our method always improves over simple linearization of the source-side parse tree and almost always improves over a sequence-to-sequence baseline, by up to +2.1 BLEU.

النصوص الفلسفية syntax-based attention masking masking for neural اخفاء الاهتمام بناء على بناء الجملة اخفاء للآلام العصبية صناعة حمض الفوسفور

Template-aware Attention Model for Earnings Call Report Generation

351 - Association for Computation Linguistics 2021 مقالة

Earning calls are among important resources for investors and analysts for updating their price targets. Firms usually publish corresponding transcripts soon after earnings events. However, raw transcripts are often too long and miss the coherent str ucture. To enhance the clarity, analysts write well-structured reports for some important earnings call events by analyzing them, requiring time and effort. In this paper, we propose TATSum (Template-Aware aTtention model for Summarization), a generalized neural summarization approach for structured report generation, and evaluate its performance in the earnings call domain. We build a large corpus with thousands of transcripts and reports using historical earnings events. We first generate a candidate set of reports from the corpus as potential soft templates which do not impose actual rules on the output. Then, we employ an encoder model with margin-ranking loss to rank the candidate set and select the best quality template. Finally, the transcript and the selected soft template are used as input in a seq2seq framework for report generation. Empirical results on the earnings call dataset show that our model significantly outperforms state-of-the-art models in terms of informativeness and structure.

earnings call template-aware attention model report generation أرباح الدعوة نموذج الانتباه تقرير التقرير صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On Attention Redundancy: A Comprehensive Study

على الاهتمام التكرار: دراسة شاملة

Ask ChatGPT about the research

Read More

suggested questions