ﻻ يوجد ملخص باللغة العربية
Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from degeneration: generated text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufficient learning of inductive biases by the attention mechanism. Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining fluency and coherence.
Sequence-to-Sequence (Seq2Seq) models have witnessed a notable success in generating natural conversational exchanges. Notwithstanding the syntactically well-formed responses generated by these neural network models, they are prone to be acontextual,
Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memor
Relation extraction is the task of determining the relation between two entities in a sentence. Distantly-supervised models are popular for this task. However, sentences can be long and two entities can be located far from each other in a sentence. T
Neural networks have proven to be efficient for a number of practical applications ranging from image recognition to identifying phase transitions in quantum physics models. In this paper we investigate the application of neural networks to state cla
Large-scale pretrained language models have led to dramatic improvements in text generation. Impressive performance can be achieved by finetuning only on a small number of instances (few-shot setting). Nonetheless, almost all previous work simply app