No Arabic abstract
Neural predictive models have achieved remarkable performance improvements in various natural language processing tasks. However, most neural predictive models suffer from the lack of explainability of predictions, limiting their practical utility. This paper proposes a neural predictive approach to make a prediction and generate its corresponding explanation simultaneously. It leverages the knowledge entailed in explanations as an additional distillation signal for more efficient learning. We conduct a preliminary study on Chinese medical multiple-choice question answering, English natural language inference, and commonsense question answering tasks. The experimental results show that the proposed approach can generate reasonable explanations for its predictions even with a small-scale training corpus. The proposed method also achieves improved prediction accuracy on three datasets, which indicates that making predictions can benefit from generating the explanation in the decision process.
Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English -- a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.
Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQs strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to ``back-off to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.
Direct scattering transform of nonlinear wave fields with solitons may lead to anomalous numerical errors of soliton phase and position parameters. With the focusing one-dimensional nonlinear Schrodinger equation serving as a model, we investigate this fundamental issue theoretically. Using the dressing method we find the landscape of soliton scattering coefficients in the plane of the complex spectral parameter for multi-soliton wave fields truncated within a finite domain, allowing us to capture the nature of particular numerical errors. They depend on the size of the computational domain $L$ leading to a counterintuitive exponential divergence when increasing $L$ in the presence of a small uncertainty in soliton eigenvalues. In contrast to classical textbooks, we reveal how one of the scattering coefficients loses its analytical properties due to the lack of the wave field compact support in case of $L to infty$. Finally, we demonstrate that despite this inherit direct scattering transform feature, the wave fields of arbitrary complexity can be reliably analysed.
Adversarial evaluation stress tests a models understanding of natural language. While past approaches expose superficial patterns, the resulting adversarial examples are limited in complexity and diversity. We propose human-in-the-loop adversarial generation, where human authors are guided to break models. We aid the authors with interpretations of model predictions through an interactive user interface. We apply this generation framework to a question answering task called Quizbowl, where trivia enthusiasts craft adversarial questions. The resulting questions are validated via live human--computer matches: although the questions appear ordinary to humans, they systematically stump neural and information retrieval models. The adversarial questions cover diverse phenomena from multi-hop reasoning to entity type distractors, exposing open challenges in robust question answering.
Intelligent agents can learn to represent the action spaces of other agents simply by observing them act. Such representations help agents quickly learn to predict the effects of their own actions on the environment and to plan complex action sequences. In this work, we address the problem of learning an agents action space purely from visual observation. We use stochastic video prediction to learn a latent variable that captures the scenes dynamics while being minimally sensitive to the scenes static content. We introduce a loss term that encourages the network to capture the composability of visual sequences and show that it leads to representations that disentangle the structure of actions. We call the full model with composable action representations Composable Learned Action Space Predictor (CLASP). We show the applicability of our method to synthetic settings and its potential to capture action spaces in complex, realistic visual settings. When used in a semi-supervised setting, our learned representations perform comparably to existing fully supervised methods on tasks such as action-conditioned video prediction and planning in the learned action space, while requiring orders of magnitude fewer action labels. Project website: https://daniilidis-group.github.io/learned_action_spaces