Probing Across Time: What Does RoBERTa Know and When?

68 0 0.0 ( 0 )

Download Cite

Added by Zeyu Liu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Leo Z. Liu - Yizhong Wang - Jungo Kasai

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Models of language trained on very large corpora have been demonstrated useful for NLP. As fixed artifacts, they have become the object of intense study, with many researchers probing the extent to which linguistic abstractions, factual and commonsense knowledge, and reasoning abilities they acquire and readily demonstrate. Building on this line of work, we consider a new question: for types of knowledge a language model learns, when during (pre)training are they acquired? We plot probing performance across iterations, using RoBERTa as a case study. Among our findings: linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive. Reasoning abilities are, in general, not stably acquired. As new datasets, pretraining protocols, and probes emerge, we believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.

rate research

How Can We Know What Language Models Know?

204 - Zhengbao Jiang , Frank F. Xu , Jun Araki 2019

Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as Obama is a _ by profession. These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as Obama worked as a _ may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://github.com/jzbjyb/LPAQA.

Computation and Language Machine Learning

Know What You Dont Know: Unanswerable Questions for SQuAD

203 - Pranav Rajpurkar , Robin Jia , 2018

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuAD 2.0, the latest version of the Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD 2.0 is a challenging natural language understanding task for existing models: a strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on SQuAD 2.0.

Computation and Language

DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

47 - Brandon Jacques , Zoran Tiganj , Marc W. Howard 2021

Extracting temporal relationships over a range of scales is a hallmark of human perception and cognition -- and thus it is a critical feature of machine learning applied to real-world problems. Neural networks are either plagued by the exploding/vanishing gradient problem in recurrent neural networks (RNNs) or must adjust their parameters to learn the relevant time scales (e.g., in LSTMs). This paper introduces DeepSITH, a network comprising biologically-inspired Scale-Invariant Temporal History (SITH) modules in series with dense connections between layers. SITH modules respond to their inputs with a geometrically-spaced set of time constants, enabling the DeepSITH network to learn problems along a continuum of time-scales. We compare DeepSITH to LSTMs and other recent RNNs on several time series prediction and decoding tasks. DeepSITH achieves state-of-the-art performance on these problems.

Machine Learning

What can be observed in real time PCR and when does it show?

101 - Pavel Chigansky , Peter Jagers , Fima Klebaner 2016

Real time, or quantitative, PCR typically starts from a very low concentration of initial DNA strands. During iterations the numbers increase, first essentially by doubling, later predominantly in a linear way. Observation of the number of DNA molecules in the experiment becomes possible only when it is substantially larger than initial numbers, and then possibly affected by the randomness in individual replication. Can the initial copy number still be determined? This is a classical problem and, indeed, a concrete special case of the general problem of determining the number of ancestors, mutants or invaders, of a population observed only later. We approach it through a generalised version of the branching process model introduced by Jagers and Klebaner, 2003 and based on Michaelis-Menten type enzyme kinetical considerations from Schnell and Mendoza, 1997. A crucial role is played by the Michaelis-Menten constant being large, as compared to initial copy numbers. In a strange way, determination of the initial number turns out to be completely possible if the initial rate $v$ is one, i.e all DNA strands replicate, but only partly so when $v<1$, and thus the initial rate or probability of successful replication is lower than one. Then, the starting molecule number becomes hidden behind a veil of uncertainty. This is a special case, of a hitherto unobserved general phenomenon in population growth processes, which will be addressed elsewhere.

Probability

What we dont know about time

232 - Vijay Balasubramanian 2011

String theory has transformed our understanding of geometry, topology and spacetime. Thus, for this special issue of Foundations of Physics commemorating Forty Years of String Theory, it seems appropriate to step back and ask what we do not understand. As I will discuss, time remains the least understood concept in physical theory. While we have made significant progress in understanding space, our understanding of time has not progressed much beyond the level of a century ago when Einstein introduced the idea of space-time as a combined entity. Thus, I will raise a series of open questions about time, and will review some of the progress that has been made as a roadmap for the future.

High Energy Physics - Theory History and Philosophy of Physics