Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Do We Know What We Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0

هل نعرف ما لا نعرفه؟دراسة أسئلة غير مرضية وراء فرقة 2.0

279 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Understanding when a text snippet does not provide a sought after information is an essential part of natural language utnderstanding. Recent work (SQuAD 2.0; Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, non-competitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE; Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.

References used

https://aclanthology.org/

rate research

How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering

328 - Association for Computation Linguistics 2021 مقالة

Abstract Recent works have shown that language models (LM) capture different types of knowledge regarding facts or common sense. However, because no model is perfect, they still fail to provide appropriate answers in many cases. In this paper, we ask the question, How can we know when language models know, with confidence, the answer to a particular query?'' We examine this question from the point of view of calibration, the property of a probabilistic model's predicted probabilities actually being well correlated with the probabilities of correctness. We examine three strong generative models---T5, BART, and GPT-2---and study whether their probabilities on QA tasks are well calibrated, finding the answer is a relatively emphatic no. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness through fine-tuning, post-hoc probability modification, or adjustment of the predicted outputs or inputs. Experiments on a diverse range of datasets demonstrate the effectiveness of our methods. We also perform analysis to study the strengths and limitations of these methods, shedding light on further improvements that may be made in methods for calibrating LMs. We have released the code at https://github.com/jzbjyb/lm-calibration.

الأهمية المرشد الإشراف عارضات ازياء صناعة حمض الفوسفور

Implicitly Abusive Language -- What does it actually look like and why are we not getting there?

737 - Association for Computation Linguistics 2021 مقالة

Abusive language detection is an emerging field in natural language processing which has received a large amount of attention recently. Still the success of automatic detection is limited. Particularly, the detection of implicitly abusive language, i .e. abusive language that is not conveyed by abusive words (e.g. dumbass or scum), is not working well. In this position paper, we explain why existing datasets make learning implicit abuse difficult and what needs to be changed in the design of such datasets. Arguing for a divide-and-conquer strategy, we present a list of subtypes of implicitly abusive language and formulate research tasks and questions for future research.

implicitly abusive language implicitly abusive لغة مسيئة ضمنيا مستنفئ ضمنيا صناعة حمض الفوسفور

Is that really a question? Going beyond factoid questions in NLP

343 - Association for Computation Linguistics 2021 مقالة

Research in NLP has mainly focused on factoid questions, with the goal of finding quick and reliable ways of matching a query to an answer. However, human discourse involves more than that: it contains non-canonical questions deployed to achieve spec ific communicative goals. In this paper, we investigate this under-studied aspect of NLP by introducing a targeted task, creating an appropriate corpus for the task and providing baseline models of diverse nature. With this, we are also able to generate useful insights on the task and open the way for future research in this direction.

بمناسبة من عالمي factoid questions أسئلة واقعية أسئلة صناعة حمض الفوسفور

Probing Across Time: What Does RoBERTa Know and When?

394 - Association for Computation Linguistics 2021 مقالة

Models of language trained on very large corpora have been demonstrated useful for natural language processing. As fixed artifacts, they have become the object of intense study, with many researchers probing'' the extent to which they acquire and rea dily demonstrate linguistic abstractions, factual and commonsense knowledge, and reasoning abilities. Recent work applied several probes to intermediate training stages to observe the developmental process of a large-scale model (Chiang et al., 2020). Following this effort, we systematically answer a question: for various types of knowledge a language model learns, when during (pre)training are they acquired? Using RoBERTa as a case study, we find: linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive. Reasoning abilities are, in general, not stably acquired. As new datasets, pretraining protocols, and probes emerge, we believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.

probing across time التحقيق عبر الزمن زمن صناعة حمض الفوسفور

Sometimes We Want Ungrammatical Translations

286 - Association for Computation Linguistics 2021 مقالة

Rapid progress in Neural Machine Translation (NMT) systems over the last few years has focused primarily on improving translation quality, and as a secondary focus, improving robustness to perturbations (e.g. spelling). While performance and robustne ss are important objectives, by over-focusing on these, we risk overlooking other important properties. In this paper, we draw attention to the fact that for some applications, faithfulness to the original (input) text is important to preserve, even if it means introducing unusual language patterns in the (output) translation. We propose a simple, novel way to quantify whether an NMT system exhibits robustness or faithfulness, by focusing on the case of word-order perturbations. We explore a suite of functions to perturb the word order of source sentences without deleting or injecting tokens, and measure their effects on the target side. Across several experimental conditions, we observe a strong tendency towards robustness rather than faithfulness. These results allow us to better understand the trade-off between faithfulness and robustness in NMT, and opens up the possibility of developing systems where users have more autonomy and control in selecting which property is best suited for their use case.

ترجمة آلة متعددة الوجه ungrammatical translations improving translation quality ترجمة غير تعليمية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Do We Know What We Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0

هل نعرف ما لا نعرفه؟دراسة أسئلة غير مرضية وراء فرقة 2.0

Ask ChatGPT about the research

Read More

suggested questions