Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

الاهتمام للأوزان في محول NMT فشل محاذاة الكلمات بين التسلسلات ولكن شرح تنبؤات نموذج إلى حد كبير

673 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that largely reduce the word alignment error rate compared to standard induced alignments from attention weights.

References used

https://aclanthology.org/

rate research

Explaining Decision-Tree Predictions by Addressing Potential Conflicts between Predictions and Plausible Expectations

720 - Association for Computation Linguistics 2021 مقالة

We offer an approach to explain Decision Tree (DT) predictions by addressing potential conflicts between aspects of these predictions and plausible expectations licensed by background information. We define four types of conflicts, operationalize the ir identification, and specify explanatory schemas that address them. Our human evaluation focused on the effect of explanations on users' understanding of a DT's reasoning and their willingness to act on its predictions. The results show that (1) explanations that address potential conflicts are considered at least as good as baseline explanations that just follow a DT path; and (2) the conflict-based explanations are deemed especially valuable when users' expectations disagree with the DT's predictions.

addressing potential conflicts explaining decision-tree predictions addressing potential معالجة الصراعات المحتملة شرح تنبؤات شجرة القرار معالجة الإمكانات صناعة حمض الفوسفور المزيد..

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

1149 - Association for Computation Linguistics 2021 مقالة

Explaining neural network models is important for increasing their trustworthiness in real-world applications. Most existing methods generate post-hoc explanations for neural network models by identifying individual feature attributions or detecting interactions between adjacent features. However, for models with text pairs as inputs (e.g., paraphrase identification), existing methods are not sufficient to capture feature interactions between two texts and their simple extension of computing all word-pair interactions between two texts is computationally inefficient. In this work, we propose the Group Mask (GMASK) method to implicitly detect word correlations by grouping correlated words from the input text pair together and measure their contribution to the corresponding NLP tasks as a whole. The proposed method is evaluated with two different model architectures (decomposable attention model and BERT) across four datasets, including natural language inference and paraphrase identification tasks. Experiments show the effectiveness of GMASK in providing faithful explanations to these models.

learning word-group masks neural network predictions predictions on sentence تعلم أقنعة مجموعة الكلمات تنبؤات الشبكة العصبية التنبؤات في الحكم صناعة حمض الفوسفور المزيد..

On the Difficulty of Segmenting Words with Attention

626 - Association for Computation Linguistics 2021 مقالة

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task. Models trained to predict words from either phones or speech (i.e., the opposite direction needed to generalize to new data), yield much worse results, suggesting that attention-based segmentation is only useful in limited scenarios.

difficulty of segmenting segmenting words difficulty صعوبة تجزئة تجزئة الكلمات صعوبة صناعة حمض الفوسفور المزيد..

Neural Attention-Aware Hierarchical Topic Model

650 - Association for Computation Linguistics 2021 مقالة

Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentenc e-level information is ignored, and (2) external semantic knowledge regarding documents, sentences and words are not exploited for the training. To address these issues, we propose a variational autoencoder (VAE) NTM model that jointly reconstructs the sentence and document word counts using combinations of bag-of-words (BoW) topical embeddings and pre-trained semantic embeddings. The pre-trained embeddings are first transformed into a common latent topical space to align their semantics with the BoW embeddings. Our model also features hierarchical KL divergence to leverage embeddings of each document to regularize those of their sentences, paying more attention to semantically relevant sentences. Both quantitative and qualitative experiments have shown the efficacy of our model in 1) lowering the reconstruction errors at both the sentence and document levels, and 2) discovering more coherent topics from real-world datasets.

attention-aware hierarchical topic neural attention-aware hierarchical الانتباه تدرك موضوع هرمي الاهتمام العصبي يدرك التسلسل الهرمي صناعة حمض الفوسفور

Template-aware Attention Model for Earnings Call Report Generation

655 - Association for Computation Linguistics 2021 مقالة

Earning calls are among important resources for investors and analysts for updating their price targets. Firms usually publish corresponding transcripts soon after earnings events. However, raw transcripts are often too long and miss the coherent str ucture. To enhance the clarity, analysts write well-structured reports for some important earnings call events by analyzing them, requiring time and effort. In this paper, we propose TATSum (Template-Aware aTtention model for Summarization), a generalized neural summarization approach for structured report generation, and evaluate its performance in the earnings call domain. We build a large corpus with thousands of transcripts and reports using historical earnings events. We first generate a candidate set of reports from the corpus as potential soft templates which do not impose actual rules on the output. Then, we employ an encoder model with margin-ranking loss to rank the candidate set and select the best quality template. Finally, the transcript and the selected soft template are used as input in a seq2seq framework for report generation. Empirical results on the earnings call dataset show that our model significantly outperforms state-of-the-art models in terms of informativeness and structure.

earnings call template-aware attention model report generation أرباح الدعوة نموذج الانتباه تقرير التقرير صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

الاهتمام للأوزان في محول NMT فشل محاذاة الكلمات بين التسلسلات ولكن شرح تنبؤات نموذج إلى حد كبير

Ask ChatGPT about the research

Read More

suggested questions