Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks

55 0 0.0 ( 0 )

Download Cite

Added by Rumen Dangovski

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ileana Rugina - Rumen Dangovski - Li Jing

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The attention mechanism is a key component of the neural revolution in Natural Language Processing (NLP). As the size of attention-based models has been scaling with the available computational resources, a number of pruning techniques have been developed to detect and to exploit sparseness in such models in order to make them more efficient. The majority of such efforts have focused on looking for attention patterns and then hard-coding them to achieve sparseness, or pruning the weights of the attention mechanisms based on statistical information from the training data. Here, we marry these two lines of research by proposing Attention Pruning (AP): a novel pruning framework that collects observations about the attention patterns in a fixed dataset and then induces a global sparseness mask for the model. This can save 90% of the attention computation for language modelling and about 50% for machine translation and for solving GLUE tasks, while maintaining the quality of the results. Moreover, using our method, we discovered important distinctions between self- and cross-attention patterns, which could guide future NLP research in attention-based modelling. Our framework can in principle speed up any model that uses attention mechanism, thus helping develop better models for existing or for new NLP applications. Our implementation is available at https://github.com/irugina/AP.

rate research

Deep Neural Networks Evolve Human-like Attention Distribution during Reading Comprehension

134 - Jiajie Zou , Nai Ding 2021

Attention is a key mechanism for information selection in both biological brains and many state-of-the-art deep neural networks (DNNs). Here, we investigate whether humans and DNNs allocate attention in comparable ways when reading a text passage to subsequently answer a specific question. We analyze 3 transformer-based DNNs that reach human-level performance when trained to perform the reading comprehension task. We find that the DNN attention distribution quantitatively resembles human attention distribution measured by fixation times. Human readers fixate longer on words that are more relevant to the question-answering task, demonstrating that attention is modulated by top-down reading goals, on top of lower-level visual and text features of the stimulus. Further analyses reveal that the attention weights in DNNs are also influenced by both top-down reading goals and lower-level stimulus features, with the shallow layers more strongly influenced by lower-level text features and the deep layers attending more to task-relevant words. Additionally, deep layers attention to task-relevant words gradually emerges when pre-trained DNN models are fine-tuned to perform the reading comprehension task, which coincides with the improvement in task performance. These results demonstrate that DNNs can evolve human-like attention distribution through task optimization, which suggests that human attention during goal-directed reading comprehension is a consequence of task optimization.

Computation and Language Artificial Intelligence

Structural Attention Neural Networks for improved sentiment analysis

315 - Filippos Kokkinos , Alexandros Potamianos 2017

We introduce a tree-structured attention neural network for sentences and small phrases and apply it to the problem of sentiment classification. Our model expands the current recursive models by incorporating structural information around a node of a syntactic tree using both bottom-up and top-down information propagation. Also, the model utilizes structural attention to identify the most salient representations during the construction of the syntactic tree. To our knowledge, the proposed models achieve state of the art performance on the Stanford Sentiment Treebank dataset.

Computation and Language Neural and Evolutionary Computing

Retrieval Augmentation for Deep Neural Networks

62 - Rita Parada Ramos , Patricia Pereira , Helena Moniz 2021

Deep neural networks have achieved state-of-the-art results in various vision and/or language tasks. Despite the use of large training datasets, most models are trained by iterating over single input-output pairs, discarding the remaining examples for the current prediction. In this work, we actively exploit the training data, using the information from nearest training examples to aid the prediction both during training and testing. Specifically, our approach uses the target of the most similar training example to initialize the memory state of an LSTM model, or to guide attention mechanisms. We apply this approach to image captioning and sentiment analysis, respectively through image and text retrieval. Results confirm the effectiveness of the proposed approach for the two tasks, on the widely used Flickr8 and IMDB datasets. Our code is publicly available at http://github.com/RitaRamo/retrieval-augmentation-nn.

Computation and Language Computer Vision and Pattern Recognition Machine Learning

Deep Neural Networks for Relation Extraction

130 - Tapas Nayak 2021

Relation extraction from text is an important task for automatic knowledge base population. In this thesis, we first propose a syntax-focused multi-factor attention network model for finding the relation between two entities. Next, we propose two joint entity and relation extraction frameworks based on encoder-decoder architecture. Finally, we propose a hierarchical entity graph convolutional network for relation extraction across documents.

Computation and Language

Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks

154 - Ivan Girardi , Pengfei Ji , An-phi Nguyen 2018

We present an operational component of a real-world patient triage system. Given a specific patient presentation, the system is able to assess the level of medical urgency and issue the most appropriate recommendation in terms of best point of care and time to treat. We use an attention-based convolutional neural network architecture trained on 600,000 doctor notes in German. We compare two approaches, one that uses the full text of the medical notes and one that uses only a selected list of medical entities extracted from the text. These approaches achieve 79% and 66% precision, respectively, but on a confidence threshold of 0.6, precision increases to 85% and 75%, respectively. In addition, a method to detect warning symptoms is implemented to render the classification task transparent from a medical perspective. The method is based on the learning of attention scores and a method of automatic validation using the same data.

Computation and Language Machine Learning Machine Learning