New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sociolectal Analysis of Pretrained Language Models

التحليل الاجتماعي لنماذج اللغة المحددة مسبقا

419 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

لغة ملثم ومقرها المحول صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Using data from English cloze tests, in which subjects also self-reported their gender, age, education, and race, we examine performance differences of pretrained language models across demographic groups, defined by these (protected) attributes. We demonstrate wide performance gaps across demographic groups and show that pretrained language models systematically disfavor young non-white male speakers; i.e., not only do pretrained language models learn social biases (stereotypical associations) -- pretrained language models also learn sociolectal biases, learning to speak more like some than like others. We show, however, that, with the exception of BERT models, larger pretrained language models reduce some the performance gaps between majority and minority groups.

References used

https://aclanthology.org/

rate research

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

297 - Association for Computation Linguistics 2021 مقالة

Transformer architecture has become ubiquitous in the natural language processing field. To interpret the Transformer-based models, their attention patterns have been extensively analyzed. However, the Transformer architecture is not only composed of the multi-head attention; other components can also contribute to Transformers' progressive performance. In this study, we extended the scope of the analysis of Transformers from solely the attention patterns to the whole attention block, i.e., multi-head attention, residual connection, and layer normalization. Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed. These results provide new intuitive explanations of existing reports; for example, discarding the learned attention patterns tends not to adversely affect the performance. The codes of our experiments are publicly available.

نماذج اللغة غامضة transformer-based masked language لغة ملثم ومقرها المحول صناعة حمض الفوسفور

Discourse Probing of Pretrained Language Models

386 - Association for Computation Linguistics 2021 مقالة

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks. In this paper, we introduce document-level discourse probing to evaluate the ability of pretrained LMs to capture document-level relations. We experiment with 7 pretrained LMs, 4 languages, and 7 discourse probing tasks, and find BART to be overall the best model at capturing discourse --- but only in its encoder, with BERT performing surprisingly well as the baseline model. Across the different models, there are substantial differences in which layers best capture discourse information, and large disparities between models.

تثق صناعة حمض الفوسفور

Generating Datasets with Pretrained Language Models

340 - Association for Computation Linguistics 2021 مقالة

To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how PLMs can be leveraged to obtain high-quality sentence embeddings without the need for labeled data, finetuning or modifications to the pretraining objective: We utilize the generative abilities of large and high-performing PLMs to generate entire datasets of labeled text pairs from scratch, which we then use for finetuning much smaller and more efficient models. Our fully unsupervised approach outperforms strong baselines on several semantic textual similarity datasets.

محاذاة الإجراءات صناعة حمض الفوسفور

Unsupervised Paraphrasing with Pretrained Language Models

398 - Association for Computation Linguistics 2021 مقالة

Paraphrase generation has benefited extensively from recent progress in the designing of training objectives and model architectures. However, previous explorations have largely focused on supervised methods, which require a large amount of labeled d ata that is costly to collect. To address this drawback, we adopt a transfer learning approach and propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking (DB). To enforce a surface form dissimilar from the input, whenever the language model emits a token contained in the source sequence, DB prevents the model from outputting the subsequent source token for the next generation step. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair (QQP) and the ParaNMT datasets and is robust to domain shift between the two datasets of distinct distributions. We also demonstrate that our model transfers to paraphrasing in other languages without any additional finetuning.

مكافأة تقليد مختلفة paraphrasing with pretrained إعادة صياغة مع الاحاد صناعة حمض الفوسفور

Measuring and Improving Consistency in Pretrained Language Models

399 - Association for Computation Linguistics 2021 مقالة

Abstract Consistency of a model---that is, the invariance of its behavior under meaning-preserving alternations in its input---is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel?, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel?, we show that the consistency of all PLMs we experiment with is poor--- though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1

pretrained language models pretrained language نماذج اللغة المحددة مسبقا اللغة المحددة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sociolectal Analysis of Pretrained Language Models

التحليل الاجتماعي لنماذج اللغة المحددة مسبقا

Ask ChatGPT about the research

Read More

suggested questions