Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Do Natural Language Explanations Represent Valid Logical Arguments? Verifying Entailment in Explainable NLI Gold Standards

هل تمثل تفسيرات اللغة الطبيعية حجج منطقية صالحة؟التحقق من الاستنزاف في معايير الذهب NLI القابلة للتفسير

262 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

An emerging line of research in Explainable NLP is the creation of datasets enriched with human-annotated explanations and rationales, used to build and evaluate models with step-wise inference and explanation generation capabilities. While human-annotated explanations are used as ground-truth for the inference, there is a lack of systematic assessment of their consistency and rigour. In an attempt to provide a critical quality assessment of Explanation Gold Standards (XGSs) for NLI, we propose a systematic annotation methodology, named Explanation Entailment Verification (EEV), to quantify the logical validity of human-annotated explanations. The application of EEV on three mainstream datasets reveals the surprising conclusion that a majority of the explanations, while appearing coherent on the surface, represent logically invalid arguments, ranging from being incomplete to containing clearly identifiable logical errors. This conclusion confirms that the inferential properties of explanations are still poorly formalised and understood, and that additional work on this line of research is necessary to improve the way Explanation Gold Standards are constructed.

References used

https://aclanthology.org/

rate research

Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI

373 - Association for Computation Linguistics 2021 مقالة

Although neural models have shown strong performance in datasets such as SNLI, they lack the ability to generalize out-of-distribution (OOD). In this work, we formulate a few-shot learning setup and examine the effects of natural language explanation s on OOD generalization. We leverage the templates in the HANS dataset and construct templated natural language explanations for each template. Although generated explanations show competitive BLEU scores against ground truth explanations, they fail to improve prediction performance. We further show that generated explanations often hallucinate information and miss key elements that indicate the label.

التعدين الرسم البياني few-shot nli قليل النار nli صناعة حمض الفوسفور

Can NLI Models Verify QA Systems' Predictions?

593 - Association for Computation Linguistics 2021 مقالة

To build robust question answering systems, we need the ability to verify whether answers to questions are truly correct, not just good enough'' in the context of imperfect QA datasets. We explore the use of natural language inference (NLI) as a way to achieve this goal, as NLI inherently requires the premise (document context) to contain all necessary information to support the hypothesis (proposed answer to the question). We leverage large pre-trained models and recent prior datasets to construct powerful question conversion and decontextualization modules, which can reformulate QA instances as premise-hypothesis pairs with very high reliability. Then, by combining standard NLI datasets with NLI examples automatically derived from QA training data, we can train NLI models to evaluate QA models' proposed answers. We show that our approach improves the confidence estimation of a QA model across different domains, evaluated in a selective QA setting. Careful manual analysis over the predictions of our NLI model shows that it can further identify cases where the QA model produces the right answer for the wrong reason, i.e., when the answer sentence cannot address all aspects of the question.

nli models verify systems' predictions verify qa systems' نماذج nli تحقق تنبؤات النظم تحقق من أنظمة ضمان الجودة صناعة حمض الفوسفور المزيد..

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

459 - Association for Computation Linguistics 2021 مقالة

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

data sanity check sanity check assessing the effect التحقق من البيانات الاختيار التعقل تقييم تأثير صناعة حمض الفوسفور المزيد..

Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

479 - Association for Computation Linguistics 2021 مقالة

Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specif ically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts in the text descriptions; (2) removing the need of nested subqueries and set operators; and (3) making the schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.

natural language specifications language specifications making sql easier مواصفات اللغة الطبيعية مواصفات اللغة جعل SQL أسهل صناعة حمض الفوسفور المزيد..

Human Perception in Natural Language Generation

380 - Association for Computation Linguistics 2021 مقالة

We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, an d observe that this fine-tuned model produces texts that are indeed perceived more human-like than the original model. Contextually, we show that our automatic evaluation strategy well correlates with human judgements. We also run a linguistic analysis to unveil the characteristics of human- vs machine-perceived language.

مستوى الصف Flesch-Kincaid perception in natural التصور في الطبيعية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Do Natural Language Explanations Represent Valid Logical Arguments? Verifying Entailment in Explainable NLI Gold Standards

هل تمثل تفسيرات اللغة الطبيعية حجج منطقية صالحة؟التحقق من الاستنزاف في معايير الذهب NLI القابلة للتفسير

Ask ChatGPT about the research

Read More

suggested questions