New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SCRIPT: Self-Critic PreTraining of Transformers

النصي: ناقد الانتقاد المحترفين للمحولات

447 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

self-critic pretraining transformers pretraining transformers script combines mlm محولات الانتقاد الذاتي المحولات محولات المحاولين البرنامج النصي يجمع بين الامتيازات صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We introduce Self-CRItic Pretraining Transformers (SCRIPT) for representation learning of text. The popular masked language modeling (MLM) pretraining methods like BERT replace some tokens with [MASK] and an encoder is trained to recover them, while ELECTRA trains a discriminator to detect replaced tokens proposed by a generator. In contrast, we train a language model as in MLM and further derive a discriminator or critic on top of the encoder without using any additional parameters. That is, the model itself is a critic. SCRIPT combines MLM training and discriminative training for learning rich representations and compute- and sample-efficiency. We demonstrate improved sample-efficiency in pretraining and enhanced representations evidenced by improved downstream task performance on GLUE and SQuAD over strong baselines. Also, the self-critic scores can be directly used as pseudo-log-likelihood for efficient scoring.

References used

https://aclanthology.org/

rate research

Script Parsing with Hierarchical Sequence Modelling

389 - Association for Computation Linguistics 2021 مقالة

Scripts capture commonsense knowledge about everyday activities and their participants. Script knowledge proved useful in a number of NLP tasks, such as referent prediction, discourse classification, and story generation. A crucial step for the explo itation of script knowledge is script parsing, the task of tagging a text with the events and participants from a certain activity. This task is challenging: it requires information both about the ways events and participants are usually uttered in surface language as well as the order in which they occur in the world. We show how to do accurate script parsing with a hierarchical sequence model and transfer learning. Our model improves the state of the art of event parsing by over 16 points F-score and, for the first time, accurately tags script participants.

hierarchical sequence modelling sequence modelling script parsing التسلسل الهرمي النمذجة نموذج التسلسل تحليل النصي صناعة حمض الفوسفور المزيد..

A Simple and Effective Positional Encoding for Transformers

415 - Association for Computation Linguistics 2021 مقالة

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with relative posit ion encodings achieving better performance. Our analysis shows that the gain actually comes from moving positional information to attention layer from the input. Motivated by this, we introduce Decoupled Positional Attention for Transformers (DIET), a simple yet effective mechanism to encode position and segment information into the Transformer models. The proposed method has faster training and inference time, while achieving competitive performance on GLUE, XTREME and WMT benchmarks. We further generalize our method to long-range transformers and show performance gain.

effective positional encoding transformer models الترميز الموضعي الفعال طرازات المحولات صناعة حمض الفوسفور

LightSeq: A High Performance Inference Library for Transformers

824 - Association for Computation Linguistics 2021 مقالة

Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient i nference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review.

high performance inference high performance performance inference library الاستدلال عالية الأداء أداء عالي مكتبة استنتاج الأداء صناعة حمض الفوسفور المزيد..

Learning Hard Retrieval Decoder Attention for Transformers

239 - Association for Computation Linguistics 2021 مقالة

The Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily. The multi-head attention network performs the scaled dot-product attention function in parallel, empowering the model by jointly atten ding to information from different representation subspaces at different positions. In this paper, we present an approach to learning a hard retrieval attention where an attention head only attends to one token in the sentence rather than all tokens. The matrix multiplication between attention probabilities and the value sequence in the standard scaled dot-product attention can thus be replaced by a simple and efficient retrieval operation. We show that our hard retrieval attention mechanism is 1.43 times faster in decoding, while preserving translation quality on a wide range of machine translation tasks when used in the decoder self- and cross-attention networks.

hard retrieval attention transformer translation model انتباه الاسترجاع الصعب ترجمة المحول نموذج صناعة حمض الفوسفور

Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

267 - Association for Computation Linguistics 2021 مقالة

The outstanding performance of transformer-based language models on a great variety of NLP and NLU tasks has stimulated interest in exploration of their inner workings. Recent research has been primarily focused on higher-level and complex linguistic phenomena such as syntax, semantics, world knowledge and common-sense. The majority of the studies is anglocentric, and little remains known regarding other languages, specifically their morphosyntactic properties. To this end, our work presents Morph Call, a suite of 46 probing tasks for four Indo-European languages of different morphology: Russian, French, English and German. We propose a new type of probing tasks based on detection of guided sentence perturbations. We use a combination of neuron-, layer- and representation-level introspection techniques to analyze the morphosyntactic content of four multilingual transformers, including their understudied distilled versions. Besides, we examine how fine-tuning on POS-tagging task affects the probing performance.

morph call presents morph call nlp and nlu تحويل مورف يقدم الدعوة مورف NLP و nlu. صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SCRIPT: Self-Critic PreTraining of Transformers

النصي: ناقد الانتقاد المحترفين للمحولات

Ask ChatGPT about the research

Read More

suggested questions