Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Critical Thinking for Language Models

76 0 0.0 ( 0 )

Download Cite

Added by Gregor Betz

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Gregor Betz - Christian Voigt - Kyle Richardson

Computation and Language Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train and evaluate GPT-2. Significant transfer learning effects can be observed: Training a model on three simple core schemes allows it to accurately complete conclusions of different, and more complex types of arguments, too. The language models generalize the core argument schemes in a correct way. Moreover, we obtain consistent and promising results for NLU benchmarks. In particular, pre-training on the argument schemes raises zero-shot accuracy on the GLUE diagnostics by up to 15 percentage points. The findings suggest that intermediary pre-training on texts that exemplify basic reasoning abilities (such as typically covered in critical thinking textbooks) might help language models to acquire a broad range of reasoning skills. The synthetic argumentative texts presented in this paper are a promising starting point for building such a critical thinking curriculum for language models.

rate research

Language Models for Lexical Inference in Context

120 - Martin Schmitt , Hinrich Schutze 2021

Lexical inference in context (LIiC) is the task of recognizing textual entailment between two very similar sentences, i.e., sentences that only differ in one expression. It can therefore be seen as a variant of the natural language inference task that is focused on lexical semantics. We formulate and evaluate the first approaches based on pretrained language models (LMs) for this task: (i) a few-shot NLI classifier, (ii) a relation induction approach based on handcrafted patterns expressing the semantics of lexical inference, and (iii) a variant of (ii) with patterns that were automatically extracted from a corpus. All our approaches outperform the previous state of the art, showing the potential of pretrained LMs for LIiC. In an extensive analysis, we investigate factors of success and failure of our three approaches.

Computation and Language Artificial Intelligence

Knowledge Inheritance for Pre-trained Language Models

114 - Yujia Qin , Yankai Lin , Jing Yi 2021

Recent explorations of large-scale pre-trained language models (PLMs) such as GPT-3 have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, training a large-scale PLM requires tremendous amounts of computational resources, which is time-consuming and expensive. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring the availability of many existing well-trained PLMs. To this end, we explore the question that how can previously trained PLMs benefit training larger PLMs in future. Specifically, we introduce a novel pre-training framework named knowledge inheritance (KI), which combines both self-learning and teacher-guided learning to efficiently train larger PLMs. Sufficient experimental results demonstrate the feasibility of our KI framework. We also conduct empirical analyses to explore the effects of teacher PLMs pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI can well support lifelong learning and knowledge transfer.

Computation and Language Artificial Intelligence Machine Learning

Bridging the Gap for Tokenizer-Free Language Models

124 - Dokook Choe , Rami Al-Rfou , Mandy Guo 2019

Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the model is essential to achieving competitive results. In this paper, we show that contrary to this conventional wisdom, tokenizer-free LMs with sufficient capacity can achieve competitive performance on a large scale dataset. We train a vanilla transformer network with 40 self-attention layers on the One Billion Word (lm1b) benchmark and achieve a new state of the art for tokenizer-free LMs, pushing these models to be on par with their word-based counterparts.

Computation and Language Artificial Intelligence Information Retrieval

Pretrained Language Models for Text Generation: A Survey

220 - Junyi Li , Tianyi Tang , Wayne Xin Zhao 2021

Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.

Computation and Language Artificial Intelligence

Eliciting Knowledge from Language Models for Event Extraction

395 - Jiaju Lin , Jin Jian , Qin Chen 2021

Eliciting knowledge contained in language models via prompt-based learning has shown great potential in many natural language processing tasks, such as text classification and generation. Whereas, the applications for more complex tasks such as event extraction are less studied, since the design of prompt is not straightforward due to the complicated types and arguments. In this paper, we explore to elicit the knowledge from pre-trained language models for event trigger detection and argument extraction. Specifically, we present various joint trigger/argument prompt methods, which can elicit more complementary knowledge by modeling the interactions between different triggers or arguments. The experimental results on the benchmark dataset, namely ACE2005, show the great advantages of our proposed approach. In particular, our approach is superior to the recent advanced methods in the few-shot scenario where only a few samples are used for training.

Computation and Language Artificial Intelligence

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Critical Thinking for Language Models

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions