New community

Subscribe to the gold package and get unlimited access to Shamra Academy

How does BERT process disfluency?

كيف تقوم برت عملية التنظير؟

410 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Natural conversations are filled with disfluencies. This study investigates if and how BERT understands disfluency with three experiments: (1) a behavioural study using a downstream task, (2) an analysis of sentence embeddings and (3) an analysis of the attention mechanism on disfluency. The behavioural study shows that without fine-tuning on disfluent data, BERT does not suffer significant performance loss when presented disfluent compared to fluent inputs (exp1). Analysis on sentence embeddings of disfluent and fluent sentence pairs reveals that the deeper the layer, the more similar their representation (exp2). This indicates that deep layers of BERT become relatively invariant to disfluency. We pinpoint attention as a potential mechanism that could explain this phenomenon (exp3). Overall, the study suggests that BERT has knowledge of disfluency structure. We emphasise the potential of using BERT to understand natural utterances without disfluency removal.

References used

https://aclanthology.org/

rate research

Does BERT Understand Idioms? A Probing-Based Empirical Study of BERT Encodings of Idioms

336 - Association for Computation Linguistics 2021 مقالة

Understanding idioms is important in NLP. In this paper, we study to what extent pre-trained BERT model can encode the meaning of a potentially idiomatic expression (PIE) in a certain context. We make use of a few existing datasets and perform two pr obing tasks: PIE usage classification and idiom paraphrase identification. Our experiment results suggest that BERT indeed can separate the literal and idiomatic usages of a PIE with high accuracy. It is also able to encode the idiomatic meaning of a PIE to some extent.

bert understand idioms bert understand understand idioms بيرت فهم التعابير بيرت تفهم فهم التعابير صناعة حمض الفوسفور المزيد..

How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?

331 - Association for Computation Linguistics 2021 مقالة

As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artifacts. While models trained on this type of data have shown promising out-of-domain generalizability, it is still unclear what the sources of such improvements are. We investigate the benefits of CAD for social NLP models by focusing on three social computing constructs --- sentiment, sexism, and hate speech. Assessing the performance of models trained with and without CAD across different types of datasets, we find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain. We unpack this apparent discrepancy using machine explanations and find that CAD reduces model reliance on spurious features. Leveraging a novel typology of CAD to analyze their relationship with model performance, we find that CAD which acts on the construct directly or a diverse set of CAD leads to higher performance.

augmented data impact counterfactually augmented data data impact models تأثير البيانات المعزز البيانات المعزز بشكل مضاد نماذج تأثير البيانات صناعة حمض الفوسفور المزيد..

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy

368 - Association for Computation Linguistics 2021 مقالة

It is widely accepted that fine-tuning pre-trained language models usually brings about performance improvements in downstream tasks. However, there are limited studies on the reasons behind this effectiveness, particularly from the viewpoint of stru ctural changes in the embedding space. Trying to fill this gap, in this paper, we analyze the extent to which the isotropy of the embedding space changes after fine-tuning. We demonstrate that, even though isotropy is a desirable geometrical property, fine-tuning does not necessarily result in isotropy enhancements. Moreover, local structures in pre-trained contextual word representations (CWRs), such as those encoding token types or frequency, undergo a massive change during fine-tuning. Our experiments show dramatic growth in the number of elongated directions in the embedding space, which, in contrast to pre-trained CWRs, carry the essential linguistic knowledge in the fine-tuned embedding space, making existing isotropy enhancement methods ineffective.

affect the geometry embedding space تؤثر على الهندسة تضمين الفضاء صناعة حمض الفوسفور

How (Non-)Optimal is the Lexicon?

417 - Association for Computation Linguistics 2021 مقالة

The mapping of lexical meanings to wordforms is a major feature of natural languages. While usage pressures might assign short words to frequent meanings (Zipf's law of abbreviation), the need for a productive and open-ended vocabulary, local constra ints on sequences of symbols, and various other factors all shape the lexicons of the world's languages. Despite their importance in shaping lexical structure, the relative contributions of these factors have not been fully quantified. Taking a coding-theoretic view of the lexicon and making use of a novel generative statistical model, we define upper bounds for the compressibility of the lexicon under various constraints. Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes. We find that (compositional) morphology and graphotactics can sufficiently account for most of the complexity of natural codes---as measured by code length.

تعلم اللغة lexicon أفضل معجم صناعة حمض الفوسفور

jurBERT: A Romanian BERT Model for Legal Judgement Prediction

348 - Association for Computation Linguistics 2021 مقالة

Transformer-based models have become the de facto standard in the field of Natural Language Processing (NLP). By leveraging large unlabeled text corpora, they enable efficient transfer learning leading to state-of-the-art results on numerous NLP task s. Nevertheless, for low resource languages and highly specialized tasks, transformer models tend to lag behind more classical approaches (e.g. SVM, LSTM) due to the lack of aforementioned corpora. In this paper we focus on the legal domain and we introduce a Romanian BERT model pre-trained on a large specialized corpus. Our model outperforms several strong baselines for legal judgement prediction on two different corpora consisting of cases from trials involving banks in Romania.

romanian bert model الرومانية بيرت نموذج صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

How does BERT process disfluency?

كيف تقوم برت عملية التنظير؟

Ask ChatGPT about the research

Read More

suggested questions