On the Limits of Minimal Pairs in Contrastive Evaluation

143 0 0.0 ( 0 )

Download Cite

Added by Jannis Vamvas

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Jannis Vamvas - Rico Sennrich

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for English-German MT that implements this recommendation.

rate research

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

295 - Alex Warstadt , Alicia Parrish , Haokun Liu 2019

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.

Computation and Language

Limits on Spherical Coefficients in the Minimal-SME Photon Sector

66 - W.J. Jessup , N.E. Russell 2016

We place limits on spherical coefficients for Lorentz violation involving operators of dimension four in the photon sector of the minimal Standard-Model Extension. The bounds are deduced from existing experimental results with optical-cavity oscillators.

High Energy Physics - Phenomenology

On the Diversity and Limits of Human Explanations

128 - Chenhao Tan 2021

A growing effort in NLP aims to build datasets of human explanations. However, the term explanation encompasses a broad range of notions, each with different properties and ramifications. Our goal is to provide an overview of diverse types of explanations and human limitations, and discuss implications for collecting and using explanations in NLP. Inspired by prior work in psychology and cognitive sciences, we group existing human explanations in NLP into three categories: proximal mechanism, evidence, and procedure. These three types differ in nature and have implications for the resultant explanations. For instance, procedure is not considered explanations in psychology and connects with a rich body of work on learning from instructions. The diversity of explanations is further evidenced by proxy questions that are needed for annotators to interpret and answer open-ended why questions. Finally, explanations may require different, often deeper, understandings than predictions, which casts doubt on whether humans can provide useful explanations in some tasks.

Computation and Language Artificial Intelligence Computers and Society

Numerical Evaluation of the Bose-Ghost Propagator in Minimal Landau Gauge on the Lattice

89 - Attilio Cucchieri , Tereza Mendes 2016

We present numerical details of the evaluation of the so-called Bose-ghost propagator in lattice minimal Landau gauge, for the SU(2) case in four Euclidean dimensions. This quantity has been proposed as a carrier of the confining force in the Gribov-Zwanziger approach and, as such, its infrared behavior could be relevant for the understanding of color confinement in Yang-Mills theories. Also, its nonzero value can be interpreted as direct evidence of BRST-symmetry breaking, which is induced when restricting the functional measure to the first Gribov region Omega. Our simulations are done for lattice volumes up to 120^4 and for physical lattice extents up to 13.5 fm. We investigate the infinite-volume and continuum limits.

High Energy Physics - Lattice

On the Evaluation of Machine Translation for Terminology Consistency

141 - Md Mahfuz ibn Alam , Antonios Anastasopoulos , Laurent Besacier 2021

As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies. In many scenarios and particularly in cases of domain adaptation, one expects the MT output to adhere to the constraints provided by a terminology. In this work, we propose metrics to measure the consistency of MT output with regards to a domain terminology. We perform studies on the COVID-19 domain over 5 languages, also performing terminology-targeted human evaluation. We open-source the code for computing all proposed metrics: https://github.com/mahfuzibnalam/terminology_evaluation

Computation and Language