Large-Scale QA-SRL Parsing

321 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nicholas FitzGerald

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Nicholas FitzGerald - Julian Michael - Luheng He

الحساب واللغة الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present neural models for two QA-SRL subtasks: detecting argument spans for a predicate and generating questions to label the semantic relationship. The best models achieve question accuracy of 82.6% and span-level accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL prediction task. They can also, as we show, be used to gather additional annotations at low cost.

قيم البحث

اقرأ أيضاً

WebQA: Multihop and Multimodal QA

98 - Yingshan Chang , Mridu Narang , Hisami Suzuki 2021

Web search is fundamentally multimodal and multihop. Often, even before asking a question we choose to go directly to image search to find our answers. Further, rarely do we find an answer from a single source but aggregate information and reason thr ough implications. Despite the frequency of this everyday occurrence, at present, there is no unified question answering benchmark that requires a single model to answer long-form natural language questions from text and open-ended visual sources -- akin to a humans experience. We propose to bridge this gap between the natural language and computer vision communities with WebQA. We show that A. our multihop text queries are difficult for a large-scale transformer model, and B. existing multi-modal transformers and visual representations do not perform well on open-domain visual queries. Our challenge for the community is to create a unified multimodal reasoning model that seamlessly transitions and reasons regardless of the source modality.

الحساب واللغة الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Can NLI Models Verify QA Systems Predictions?

104 - Jifan Chen , Eunsol Choi , Greg Durrett 2021

To build robust question answering systems, we need the ability to verify whether answers to questions are truly correct, not just good enough in the context of imperfect QA datasets. We explore the use of natural language inference (NLI) as a way to achieve this goal, as NLI inherently requires the premise (document context) to contain all necessary information to support the hypothesis (proposed answer to the question). We leverage large pre-trained models and recent prior datasets to construct powerful question converter and decontextualization modules, which can reformulate QA instances as premise-hypothesis pairs with very high reliability. Then, by combining standard NLI datasets with NLI examples automatically derived from QA training data, we can train NLI models to judge the correctness of QA models proposed answers. We show that our NLI approach can generally improve the confidence estimation of a QA model across different domains, evaluated in a selective QA setting. Careful manual analysis over the predictions of our NLI model shows that it can further identify cases where the QA model produces the right answer for the wrong reason, or where the answer cannot be verified as addressing all aspects of the question.

الحساب واللغة الذكاء الاصطناعي

CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

88 - Yacine Jernite , Kavya Srinet , Jonathan Gray 2019

We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft. We describe the data collection process which yields additional 35K human generated instructions with their semantic annotations . We report the performance of three baseline models and find that while a dataset of this size helps us train a usable instruction parser, it still poses interesting generalization challenges which we hope will help develop better and more robust models.

الحساب واللغة الذكاء الاصطناعي

Deep Human Answer Understanding for Natural Reverse QA

97 - Rujing Yao , Linlin Hou , Lei Yang 2019

This study focuses on a reverse question answering (QA) procedure, in which machines proactively raise questions and humans supply the answers. This procedure exists in many real human-machine interaction applications. However, a crucial problem in h uman-machine interaction is answer understanding. The existing solutions have relied on mandatory option term selection to avoid automatic answer understanding. However, these solutions have led to unnatural human-computer interaction and negatively affected user experience. To this end, the current study proposes a novel deep answer understanding network, called AntNet, for reverse QA. The network consists of three new modules, namely, skeleton attention for questions, relevance-aware representation of answers, and multi-hop based fusion. As answer understanding for reverse QA has not been explored, a new data corpus is compiled in this study. Experimental results indicate that our proposed network is significantly better than existing methods and those modified from classical natural language processing deep models. The effectiveness of the three new modules is also verified.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

UnifiedQA: Crossing Format Boundaries With a Single QA System

216 - Daniel Khashabi , Sewon Min , Tushar Khot 2020

Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such bou ndaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. UnifiedQA performs on par with 9 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UnifiedQA performs surprisingly well, showing strong generalization from its out-of-format training data. Finally, simply fine-tuning this pre-trained QA model into specialized models results in a new state of the art on 6 datasets, establishing UnifiedQA as a strong starting point for building QA systems.

الحساب واللغة الذكاء الاصطناعي