No Arabic abstract
Data collection for natural language (NL) understanding tasks has increasingly included human explanations alongside data points, allowing past works to introduce models that both perform a task and generate NL explanations for their outputs. Yet to date, model-generated explanations have been evaluated on the basis of surface-level similarities to human explanations, both through automatic metrics like BLEU and human evaluations. We argue that these evaluations are insufficient, since they fail to indicate whether explanations support actual model behavior (faithfulness), rather than simply match what a human would say (plausibility). In this work, we address the problem of evaluating explanations from the model simulatability perspective. Our contributions are as follows: (1) We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations, which measures how well explanations help an observer predict a models output, while controlling for how explanations can directly leak the output. We use a model as a proxy for a human observer, and validate this choice with two human subject experiments. (2) Using the CoS-E and e-SNLI datasets, we evaluate two existing generative graphical models and two new approaches; one rationalizing method we introduce achieves roughly human-level LAS scores. (3) Lastly, we frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage, which can improve LAS scores. We provide code for the experiments in this paper at https://github.com/peterbhase/LAS-NL-Explanations
Providing personalized explanations for recommendations can help users to understand the underlying insight of the recommendation results, which is helpful to the effectiveness, transparency, persuasiveness and trustworthiness of recommender systems. Current explainable recommendation models mostly generate textual explanations based on pre-defined sentence templates. However, the expressiveness power of template-based explanation sentences are limited to the pre-defined expressions, and manually defining the expressions require significant human efforts. Motivated by this problem, we propose to generate free-text natural language explanations for personalized recommendation. In particular, we propose a hierarchical sequence-to-sequence model (HSS) for personalized explanation generation. Different from conventional sentence generation in NLP research, a great challenge of explanation generation in e-commerce recommendation is that not all sentences in user reviews are of explanation purpose. To solve the problem, we further propose an auto-denoising mechanism based on topical item feature words for sentence generation. Experiments on various e-commerce product domains show that our approach can not only improve the recommendation accuracy, but also the explanation quality in terms of the offline measures and feature words coverage. This research is one of the initial steps to grant intelligent agents with the ability to explain itself based on natural language sentences.
Is it possible to use natural language to intervene in a models behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) models unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a systems social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even todays powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to ethical interventions even though these interventions are stated as simple sentences. Few-shot learning improves model behavior but remains far from the desired outcome, especially when evaluated for various types of generalization. Our new task thus poses a novel language understanding challenge for the community.
A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors? Our goal is to allow users to interactively correct explanation structures through natural language feedback. We introduce MERCURIE - an interactive system that refines its explanations for a given reasoning task by getting human feedback in natural language. Our approach generates graphs that have 40% fewer inconsistencies as compared with the off-the-shelf system. Further, simply appending the corrected explanation structures to the output leads to a gain of 1.2 points on accuracy on defeasible reasoning across all three domains. We release a dataset of over 450k graphs for defeasible reasoning generated by our system at https://tinyurl.com/mercurie .
The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning. However, the automatic way in which SNLI-VE has been assembled (via combining parts of two related datasets) gives rise to a large number of errors in the labels of this corpus. In this paper, we first present a data collection effort to correct the class with the highest error rate in SNLI-VE. Secondly, we re-evaluate an existing model on the corrected corpus, which we call SNLI-VE-2.0, and provide a quantitative comparison with its performance on the non-corrected corpus. Thirdly, we introduce e-SNLI-VE, which appends human-written natural language explanations to SNLI-VE-2.0. Finally, we train models that learn from these explanations at training time, and output such explanations at testing time.
Neural networks models for NLP are typically implemented without the explicit encoding of language rules and yet they are able to break one performance record after another. This has generated a lot of research interest in interpreting the representations learned by these networks. We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. We use brain imaging recordings of subjects reading complex natural text to interpret word and sequence embeddings from 4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, context length, and attention type. Our results reveal differences in the context-related representations across these models. Further, in the transformer models, we find an interaction between layer depth and context length, and between layer depth and attention type. We finally hypothesize that altering BERT to better align with brain recordings would enable it to also better understand language. Probing the altered BERT using syntactic NLP tasks reveals that the model with increased brain-alignment outperforms the original model. Cognitive neuroscientists have already begun using NLP networks to study the brain, and this work closes the loop to allow the interaction between NLP and cognitive neuroscience to be a true cross-pollination.