ﻻ يوجد ملخص باللغة العربية
Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the model is essential to achieving competitive results. In this paper, we show that contrary to this conventional wisdom, tokenizer-free LMs with sufficient capacity can achieve competitive performance on a large scale dataset. We train a vanilla transformer network with 40 self-attention layers on the One Billion Word (lm1b) benchmark and achieve a new state of the art for tokenizer-free LMs, pushing these models to be on par with their word-based counterparts.
Linked text representation is critical for many intelligent web applications, such as online advertisement and recommender systems. Recent breakthroughs on pretrained language models and graph neural networks facilitate the development of correspondi
We probe pre-trained transformer language models for bridging inference. We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations in-comparison with the lower an
We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summariz
Commonsense knowledge is crucial for artificial intelligence systems to understand natural language. Previous commonsense knowledge acquisition approaches typically rely on human annotations (for example, ATOMIC) or text generation models (for exampl
Understanding human language is one of the key themes of artificial intelligence. For language representation, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy texts and getting rid of the noises is es