New community

Subscribe to the gold package and get unlimited access to Shamra Academy

What Taggers Fail to Learn, Parsers Need the Most

ما هي التقنيص في تعلم، يحتاج المحللون إلى أكثر

307 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

neural upos taggers upos tags upos taggers upos العصبية upos العلامات upos. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present an error analysis of neural UPOS taggers to evaluate why using gold tags has such a large positive contribution to parsing performance while using predicted UPOS either harms performance or offers a negligible improvement. We also evaluate what neural dependency parsers implicitly learn about word types and how this relates to the errors taggers make, to explain the minimal impact using predicted tags has on parsers. We then mask UPOS tags based on errors made by taggers to tease away the contribution of UPOS tags that taggers succeed and fail to classify correctly and the impact of tagging errors.

References used

https://aclanthology.org/

rate research

How much pretraining data do language models need to learn syntax?

559 - Association for Computation Linguistics 2021 مقالة

Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impa ct of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.

learn syntax pretraining data size تعلم بناء الجملة احتجاج حجم البيانات صناعة حمض الفوسفور

Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?

271 - Association for Computation Linguistics 2021 مقالة

In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion. Moreover, we link the emergent stereotypes to their manifestation as basic emotions as a means to study their emotional effects in a more generalized manner. To demonstrate how our methods can be used to analyze emotion and stereotype shifts due to linguistic experience, we use fine-tuning on news sources as a case study. Our experiments expose how attitudes towards different social groups vary across models and how quickly emotions and stereotypes can shift at the fine-tuning stage.

تحويل ملثمين language models learn نماذج اللغة تعلم صناعة حمض الفوسفور

Generic resources are what you need: Style transfer tasks without task-specific parallel training data

523 - Association for Computation Linguistics 2021 مقالة

Style transfer aims to rewrite a source text in a different target style while preserving its content. We propose a novel approach to this task that leverages generic resources, and without using any task-specific parallel (source--target) data outpe rforms existing unsupervised approaches on the two most popular style transfer tasks: formality transfer and polarity swap. In practice, we adopt a multi-step procedure which builds on a generic pre-trained sequence-to-sequence model (BART). First, we strengthen the model's ability to rewrite by further pre-training BART on both an existing collection of generic paraphrases, as well as on synthetic pairs created using a general-purpose lexical resource. Second, through an iterative back-translation approach, we train two models, each in a transfer direction, so that they can provide each other with synthetically generated pairs, dynamically in the training process. Lastly, we let our best resulting model generate static synthetic pairs to be used in a supervised training regime. Besides methodology and state-of-the-art results, a core contribution of this work is a reflection on the nature of the two tasks we address, and how their differences are highlighted by their response to our approach.

style transfer tasks style transfer aims أهداف نقل النمط صناعة حمض الفوسفور

Learning to Learn to be Right for the Right Reasons

585 - Association for Computation Linguistics 2021 مقالة

Improving model generalization on held-out data is one of the core objectives in common- sense reasoning. Recent work has shown that models trained on the dataset with superficial cues tend to perform well on the easy test set with superficial cues b ut perform poorly on the hard test set without superficial cues. Previous approaches have resorted to manual methods of encouraging models not to overfit to superficial cues. While some of the methods have improved performance on hard instances, they also lead to degraded performance on easy in- stances. Here, we propose to explicitly learn a model that does well on both the easy test set with superficial cues and the hard test set without superficial cues. Using a meta-learning objective, we learn such a model that improves performance on both the easy test set and the hard test set. By evaluating our models on Choice of Plausible Alternatives (COPA) and Commonsense Explanation, we show that our proposed method leads to improved performance on both the easy test set and the hard test set upon which we observe up to 16.5 percentage points improvement over the baseline.

easy test set hard test set من السهل اختبار مجموعة اختبار الصعب صناعة حمض الفوسفور

What does BERT Learn from Arabic Machine Reading Comprehension Datasets?

499 - Association for Computation Linguistics 2021 مقالة

In machine reading comprehension tasks, a model must extract an answer from the available context given a question and a passage. Recently, transformer-based pre-trained language models have achieved state-of-the-art performance in several natural la nguage processing tasks. However, it is unclear whether such performance reflects true language understanding. In this paper, we propose adversarial examples to probe an Arabic pre-trained language model (AraBERT), leading to a significant performance drop over four Arabic machine reading comprehension datasets. We present a layer-wise analysis for the transformer's hidden states to offer insights into how AraBERT reasons to derive an answer. The experiments indicate that AraBERT relies on superficial cues and keyword matching rather than text understanding. Furthermore, hidden state visualization demonstrates that prediction errors can be recognized from vector representations in earlier layers.

machine reading comprehension bert learn reading comprehension datasets آلة قراءة الآلة بيرت تعلم قراءة مجموعات البيانات الفهم صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

What Taggers Fail to Learn, Parsers Need the Most

ما هي التقنيص في تعلم، يحتاج المحللون إلى أكثر

Ask ChatGPT about the research

Read More

suggested questions