New community

Subscribe to the gold package and get unlimited access to Shamra Academy

``Average'' Approximates ``First Principal Component''? An Empirical Analysis on Representations from Neural Language Models

"" متوسط "" تقريب "أول عنصر رئيسي"؟تحليل تجريبي حول التمثيلات من نماذج اللغة العصبية

326 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

التحسين وشابلي المرشدين صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Contextualized representations based on neural language models have furthered the state of the art in various NLP tasks. Despite its great success, the nature of such representations remains a mystery. In this paper, we present an empirical property of these representations---''average'' approximates first principal component''. Specifically, experiments show that the average of these representations shares almost the same direction as the first principal component of the matrix whose columns are these representations. We believe this explains why the average representation is always a simple yet strong baseline. Our further examinations show that this property also holds in more challenging scenarios, for example, when the representations are from a model right after its random initialization. Therefore, we conjecture that this property is intrinsic to the distribution of representations and not necessarily related to the input structure. We realize that these representations empirically follow a normal distribution for each dimension, and by assuming this is true, we demonstrate that the empirical property can be in fact derived mathematically.

References used

https://aclanthology.org/

rate research

Exploring Neural Language Models via Analysis of Local and Global Self-Attention Spaces

280 - Association for Computation Linguistics 2021 مقالة

Large pretrained language models using the transformer neural network architecture are becoming a dominant methodology for many natural language processing tasks, such as question answering, text classification, word sense disambiguation, text comple tion and machine translation. Commonly comprising hundreds of millions of parameters, these models offer state-of-the-art performance, but at the expense of interpretability. The attention mechanism is the main component of transformer networks. We present AttViz, a method for exploration of self-attention in transformer networks, which can help in explanation and debugging of the trained models by showing associations between text tokens in an input sequence. We show that existing deep learning pipelines can be explored with AttViz, which offers novel visualizations of the attention heads and their aggregations. We implemented the proposed methods in an online toolkit and an offline library. Using examples from news analysis, we demonstrate how AttViz can be used to inspect and potentially better understand what a model has learned.

global self-attention spaces local and global exploring neural language مساحات عالمية انتباهي المحلية والعالمية استكشاف اللغة العصبية صناعة حمض الفوسفور المزيد..

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

372 - Association for Computation Linguistics 2021 مقالة

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away th e notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

empirical analysis incremental nlu التحليل التجريبي nlu التزايدي صناعة حمض الفوسفور

Text Counterfactuals via Latent Optimization and Shapley-Guided Search

330 - Association for Computation Linguistics 2021 مقالة

We study the problem of generating counterfactual text for a classifier as a means for understanding and debugging classification. Given a textual input and a classification model, we aim to minimally alter the text to change the model's prediction. White-box approaches have been successfully applied to similar problems in vision where one can directly optimize the continuous input. Optimization-based approaches become difficult in the language domain due to the discrete nature of text. We bypass this issue by directly optimizing in the latent space and leveraging a language model to generate candidate modifications from optimized latent representations. We additionally use Shapley values to estimate the combinatoric effect of multiple changes. We then use these estimates to guide a beam search for the final counterfactual text. We achieve favorable performance compared to recent white-box and black-box baselines using human and automatic evaluations. Ablation studies show that both latent optimization and the use of Shapley values improve success rate and the quality of the generated counterfactuals.

counterfactual text optimization and shapley-guided نص مضاد التحسين وشابلي المرشدين صناعة حمض الفوسفور

An Accurate Algorithm for Negentropy Approximation-Based Independent Component Analysis

942 - Aِl-Baath University 2016 ورقة بحثية

In this paper, we propose a new accurate and fast converging independent component analysis algorithm.

تحليل المركبات المستقلة الكشف الأعمى للمنابع الإنتروبي العكسي الإحصاءات التراكمية ذات المراتب العليا (Independent Component Analysis (ICA (Blind Source Separation (BSS Negentropy HO-Cumulants المزيد..

Evaluating the Robustness of Neural Language Models to Input Perturbations

406 - Association for Computation Linguistics 2021 مقالة

High-performance neural language models have obtained state-of-the-art results on a wide range of Natural Language Processing (NLP) tasks. However, results for common benchmark datasets often do not reflect model reliability and robustness when appli ed to noisy, real-world data. In this study, we design and implement various types of character-level and word-level perturbation methods to simulate realistic scenarios in which input texts may be slightly noisy or different from the data distribution on which NLP systems were trained. Conducting comprehensive experiments on different NLP tasks, we investigate the ability of high-performance language models such as BERT, XLNet, RoBERTa, and ELMo in handling different types of input perturbations. The results suggest that language models are sensitive to input perturbations and their performance can decrease even when small changes are introduced. We highlight that models need to be further improved and that current benchmarks are not reflecting model robustness well. We argue that evaluations on perturbed inputs should routinely complement widely-used benchmarks in order to yield a more realistic understanding of NLP systems' robustness.

فهم جعل التحيز صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

``Average'' Approximates ``First Principal Component''? An Empirical Analysis on Representations from Neural Language Models

"" متوسط "" تقريب "أول عنصر رئيسي"؟تحليل تجريبي حول التمثيلات من نماذج اللغة العصبية

Ask ChatGPT about the research

Read More

suggested questions