Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

On the Effects of Transformer Size on In- and Out-of-Domain Calibration

في آثار حجم المحولات على المعايرة داخل النطاق

210 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

effects of transformer آثار المحولات صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Large, pre-trained transformer language models, which are pervasive in natural language processing tasks, are notoriously expensive to train. To reduce the cost of training such large models, prior work has developed smaller, more compact models which achieves a significant speedup in training time while maintaining competitive accuracy to the original model on downstream tasks. Though these smaller pre-trained models have been widely adopted by the community, it is not known how well are they calibrated compared to their larger counterparts. In this paper, focusing on a wide range of tasks, we thoroughly investigate the calibration properties of pre-trained transformers, as a function of their size. We demonstrate that when evaluated in-domain, smaller models are able to achieve competitive, and often better, calibration compared to larger models, while achieving significant speedup in training time. Post-hoc calibration techniques further reduce calibration error for all models in-domain. However, when evaluated out-of-domain, larger models tend to be better calibrated, and label-smoothing instead is an effective strategy to calibrate models in this setting.

References used

https://aclanthology.org/

rate research

Like Chalk and Cheese? On the Effects of Translationese in MT Training

397 - Association for Computation Linguistics 2021 مقالة

We revisit the topic of translation direction in the data used for training neural machine translation systems and focusing on a real-world scenario with known translation direction and imbalances in translation direction: the Canadian Hansard. Accor ding to automatic metrics and we observe that using parallel data that was produced in the matching'' translation direction (Authentic source and translationese target) improves translation quality. In cases of data imbalance in terms of translation direction and we find that tagging of translation direction can close the performance gap. We perform a human evaluation that differs slightly from the automatic metrics and but nevertheless confirms that for this French-English dataset that is known to contain high-quality translations and authentic or tagged mixed source improves over translationese source for training.

chalk and cheese translation direction الطباشير والجبن اتجاه الترجمة صناعة حمض الفوسفور

The effect of sample size on the statistical test power

2252 - Aِl-Baath University 2017 ورقة بحثية

The research aims to estimate the effect of sample size on the statistical test power (t) for one sample, two interrelated samples, two independent samples, and on the statistical test power of one-way analysis of variance test (F) to compare the averages. The descriptive method was used, and different sizes of samples (300) items, where it was generated using the program (PASS 14), and taken into account to be realized in this data the set of assumptions needed to make test (t) and (F), with respect to random testing, categorical level of measurement, normal distribution, and equinoctial variance.

حجم العينة قوة الاختبار الإحصائي sample size the statistical test power

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance

537 - Association for Computation Linguistics 2021 مقالة

Multilingual language models achieve impressive zero-shot accuracies in many languages in complex tasks such as Natural Language Inference (NLI). Examples in NLI (and equivalent complex tasks) often pertain to various types of sub-tasks, requiring di fferent kinds of reasoning. Certain types of reasoning have proven to be more difficult to learn in a monolingual context, and in the crosslingual context, similar observations may shed light on zero-shot transfer efficiency and few-shot sample selection. Hence, to investigate the effects of types of reasoning on transfer performance, we propose a category-annotated multilingual NLI dataset and discuss the challenges to scale monolingual annotations to multiple languages. We statistically observe interesting effects that the confluence of reasoning types and language similarities have on transfer performance.

cross-lingual transfer performance transfer performance أداء نقل اللغات نقل الأداء صناعة حمض الفوسفور

Frequency Effects on Syntactic Rule Learning in Transformers

415 - Association for Computation Linguistics 2021 مقالة

Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject--verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a series of controlled interventions at pre-training time. We show that BERT often generalizes well to subject--verb pairs that never occurred in training, suggesting a degree of rule-governed behavior. We also find, however, that performance is heavily influenced by word frequency, with experiments showing that both the absolute frequency of a verb form, as well as the frequency relative to the alternate inflection, are causally implicated in the predictions BERT makes at inference time. Closer analysis of these frequency effects reveals that BERT's behavior is consistent with a system that correctly applies the SVA rule in general but struggles to overcome strong training priors and to estimate agreement features (singular vs. plural) on infrequent lexical items.

syntactic rule learning learning in transformers syntactic rule الحكم النحوي التعلم التعلم في المحولات صناعة حمض الفوسفور

AdapterDrop: On the Efficiency of Adapters in Transformers

586 - Association for Computation Linguistics 2021 مقالة

Transformer models are expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters . In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that AdapterDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely.

adapters صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On the Effects of Transformer Size on In- and Out-of-Domain Calibration

في آثار حجم المحولات على المعايرة داخل النطاق

Ask ChatGPT about the research

Read More

suggested questions