No Arabic abstract
QDMR is a meaning representation for complex questions, which decomposes questions into a sequence of atomic steps. While state-of-the-art QDMR parsers use the common sequence-to-sequence (seq2seq) approach, a QDMR structure fundamentally describes labeled relations between spans in the input question, and thus dependency-based approaches seem appropriate for this task. In this work, we present a QDMR parser that is based on dependency graphs (DGs), where nodes in the graph are words and edges describe logical relations that correspond to the different computation steps. We propose (a) a non-autoregressive graph parser, where all graph edges are computed simultaneously, and (b) a seq2seq parser that uses gold graph as auxiliary supervision. We find that a graph parser leads to a moderate reduction in performance (0.47 to 0.44), but to a 16x speed-up in inference time due to the non-autoregressive nature of the parser, and to improved sample complexity compared to a seq2seq model. Second, a seq2seq model trained with auxiliary graph supervision has better generalization to new domains compared to a seq2seq model, and also performs better on questions with long sequences of computation steps.
This paper describes the system used in submission from SHANGHAITECH team to the IWPT 2021 Shared Task. Our system is a graph-based parser with the technique of Automated Concatenation of Embeddings (ACE). Because recent work found that better word representations can be obtained by concatenating different types of embeddings, we use ACE to automatically find the better concatenation of embeddings for the task of enhanced universal dependencies. According to official results averaged on 17 languages, our system ranks 2nd over 9 teams.
While few-shot classification has been widely explored with similarity based methods, few-shot sequence labeling poses a unique challenge as it also calls for modeling the label dependencies. To consider both the item similarity and label dependency, we propose to leverage the conditional random fields (CRFs) in few-shot sequence labeling. It calculates emission score with similarity based methods and obtains transition score with a specially designed transfer mechanism. When applying CRF in the few-shot scenarios, the discrepancy of label sets among different domains makes it hard to use the label dependency learned in prior domains. To tackle this, we introduce the dependency transfer mechanism that transfers abstract label transition patterns. In addition, the similarity methods rely on the high quality sample representation, which is challenging for sequence labeling, because sense of a word is different when measuring its similarity to words in different sentences. To remedy this, we take advantage of recent contextual embedding technique, and further propose a pair-wise embedder. It provides additional certainty for word sense by embedding query and support sentence pairwisely. Experimental results on slot tagging and named entity recognition show that our model significantly outperforms the strongest few-shot learning baseline by 11.76 (21.2%) and 12.18 (97.7%) F1 scores respectively in the one-shot setting.
This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.
Despite recent progress, state-of-the-art question answering models remain vulnerable to a variety of adversarial attacks. While dynamic adversarial data collection, in which a human annotator tries to write examples that fool a model-in-the-loop, can improve model robustness, this process is expensive which limits the scale of the collected data. In this work, we are the first to use synthetic adversarial data generation to make question answering models more robust to human adversaries. We develop a data generation pipeline that selects source passages, identifies candidate answers, generates questions, then finally filters or re-labels them to improve quality. Using this approach, we amplify a smaller human-written adversarial dataset to a much larger set of synthetic question-answer pairs. By incorporating our synthetic data, we improve the state-of-the-art on the AdversarialQA dataset by 3.7F1 and improve model generalisation on nine of the twelve MRQA datasets. We further conduct a novel human-in-the-loop evaluation to show that our models are considerably more robust to new human-written adversarial examples: crowdworkers can fool our model only 8.8% of the time on average, compared to 17.6% for a model trained without synthetic data.
Recent efforts to create challenge benchmarks that test the abilities of natural language understanding models have largely depended on human annotations. In this work, we introduce the Break, Perturb, Build (BPB) framework for automatic reasoning-oriented perturbation of question-answer pairs. BPB represents a question by decomposing it into the reasoning steps that are required to answer it, symbolically perturbs the decomposition, and then generates new question-answer pairs. We demonstrate the effectiveness of BPB by creating evaluation sets for three reading comprehension (RC) benchmarks, generating thousands of high-quality examples without human intervention. We evaluate a range of RC models on our evaluation sets, which reveals large performance gaps on generated examples compared to the original data. Moreover, symbolic perturbations enable fine-grained analysis of the strengths and limitations of models. Last, augmenting the training data with examples generated by BPB helps close performance gaps, without any drop on the original data distribution.