New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Revisiting Simple Neural Probabilistic Language Models

إعادة النظر في نماذج لغة الاحتمالية البسيطة العصبية

398 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

revisiting simple neural simple neural probabilistic neural probabilistic language إعادة النظر في العصبية البسيطة الاحتمال العصبي بسيط لغة الاحتمالية العصبية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of Bengio et al. (2003), which simply concatenates word embeddings within a fixed window and passes the result through a feed-forward network to predict the next word. When scaled up to modern hardware, this model (despite its many limitations) performs much better than expected on word-level language model benchmarks. Our analysis reveals that the NPLM achieves lower perplexity than a baseline Transformer with short input contexts but struggles to handle long-term dependencies. Inspired by this result, we modify the Transformer by replacing its first self-attention layer with the NPLM's local concatenation layer, which results in small but consistent perplexity decreases across three word-level language modeling datasets.

References used

https://aclanthology.org/

rate research

Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

244 - Association for Computation Linguistics 2021 مقالة

Policy gradient algorithms have found wide adoption in NLP, but have recently become subject to criticism, doubting their suitability for NMT. Choshen et al. (2020) identify multiple weaknesses and suspect that their success is determined by the shap e of output distributions rather than the reward. In this paper, we revisit these claims and study them under a wider range of configurations. Our experiments on in-domain and cross-domain adaptation reveal the importance of exploration and reward scaling, and provide empirical counter-evidence to these claims.

ترميم الذاكرة المطلوبة تعزيز التعلم صناعة حمض الفوسفور

Revisiting Pretraining with Adapters

342 - Association for Computation Linguistics 2021 مقالة

Pretrained language models have served as the backbone for many state-of-the-art NLP results. These models are large and expensive to train. Recent work suggests that continued pretraining on task-specific data is worth the effort as pretraining lead s to improved performance on downstream tasks. We explore alternatives to full-scale task-specific pretraining of language models through the use of adapter modules, a parameter-efficient approach to transfer learning. We find that adapter-based pretraining is able to achieve comparable results to task-specific pretraining while using a fraction of the overall trainable parameters. We further explore direct use of adapters without pretraining and find that the direct fine-tuning performs mostly on par with pretrained adapter models, contradicting previously proposed benefits of continual pretraining in full pretraining fine-tuning strategies. Lastly, we perform an ablation study on task-adaptive pretraining to investigate how different hyperparameter settings can change the effectiveness of the pretraining.

pretraining revisiting pretraining التدريب قبل إعادة النظر في إعادة المحاولة صناعة حمض الفوسفور

Reconsidering the Past: Optimizing Hidden States in Language Models

394 - Association for Computation Linguistics 2021 مقالة

We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models at inference time. Similar to dynamic evaluation (Krause et al., 2018), HSO computes the gradient of the log-probability the language model assigns to an evaluation text, but uses it to update the cached hidden states rather than the model parameters. We test HSO with pretrained Transformer-XL and GPT-2 language models, finding improvement on the WikiText-103 and PG-19 datasets in terms of perplexity, especially when evaluating a model outside of its training distribution. We also demonstrate downstream applicability by showing gains in the recently developed prompt-based few-shot evaluation setting, again with no extra parameters or training data.

optimizing hidden states reconsidering the past optimizing hidden تحسين الدول المخفية إعادة النظر في الماضي تحسين مخفي صناعة حمض الفوسفور المزيد..

Probabilistic Ensembles of Zero- and Few-Shot Learning Models for Emotion Classification

274 - Association for Computation Linguistics 2021 مقالة

Emotion Classification is the task of automatically associating a text with a human emotion. State-of-the-art models are usually learned using annotated corpora or rely on hand-crafted affective lexicons. We present an emotion classification model th at does not require a large annotated corpus to be competitive. We experiment with pretrained language models in both a zero-shot and few-shot configuration. We build several of such models and consider them as biased, noisy annotators, whose individual performance is poor. We aggregate the predictions of these models using a Bayesian method originally developed for modelling crowdsourced annotations. Next, we show that the resulting system performs better than the strongest individual model. Finally, we show that when trained on few labelled data, our systems outperform fully-supervised models.

probabilistic ensembles few-shot learning models مصمقة الاحتمالية نماذج التعلم قليلة صناعة حمض الفوسفور

Revisiting Multi-Domain Machine Translation

244 - Association for Computation Linguistics 2021 مقالة

When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of re cent work that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that further work is needed to better analyze the current behaviour of multi-domain systems and to make them fully hold their promises.

multi-domain machine translation revisiting multi-domain machine ترجمة متعددة المجالات إعادة النظر آلة متعددة المجالات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Revisiting Simple Neural Probabilistic Language Models

إعادة النظر في نماذج لغة الاحتمالية البسيطة العصبية

Ask ChatGPT about the research

Read More

suggested questions