CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense

239 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jared Fernandez

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Michael Chen - Mike DArcy - Alisa Liu

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Commonsense reasoning is a critical AI capability, but it is difficult to construct challenging datasets that test common sense. Recent neural question answering systems, based on large pre-trained models of language, have already achieved near-human-level performance on commonsense knowledge benchmarks. These systems do not possess human-level common sense, but are able to exploit limitations of the datasets to achieve human-level scores. We introduce the CODAH dataset, an adversarially-constructed evaluation dataset for testing common sense. CODAH forms a challenging extension to the recently-proposed SWAG dataset, which tests commonsense knowledge using sentence-completion questions that describe situations observed in video. To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems. Workers are rewarded for submissions that models fail to answer correctly both before and after fine-tuning (in cross-validation). We create 2.8k questions via this procedure and evaluate the performance of multiple state-of-the-art question answering systems on our dataset. We observe a significant gap between human performance, which is 95.3%, and the performance of the best baseline accuracy of 67.5% by the BERT-Large model.

قيم البحث

134 - Svitlana Vakulenko , Shayne Longpre , Zhucheng Tu 2020

The dependency between an adequate question formulation and correct answer selection is a very intriguing but still underexplored area. In this paper, we show that question rewriting (QR) of the conversational context allows to shed more light on thi s phenomenon and also use it to evaluate robustness of different answer selection approaches. We introduce a simple framework that enables an automated analysis of the conversational question answering (QA) performance using question rewrites, and present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets. Our experiments uncover sensitivity to question formulation of the popular state-of-the-art models for reading comprehension and passage ranking. Our results demonstrate that the reading comprehension model is insensitive to question formulation, while the passage ranking changes dramatically with a little variation in the input question. The benefit of QR is that it allows us to pinpoint and group such cases automatically. We show how to use this methodology to verify whether QA models are really learning the task or just finding shortcuts in the dataset, and better understand the frequent types of error they make.

الحساب واللغة

Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

140 - Kristiyan Vachev , Momchil Hardalov , Georgi Karadzhov 2021

In education, open-ended quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternativ e. So far, the vast majority of research has focused on generating the question text, relying on question answering datasets with readily picked answers, and the problem of how to come up with answer candidates in the first place has been largely ignored. Here, we aim to bridge this gap. In particular, we propose a model that can generate a specified number of answer candidates for a given passage of text, which can then be used by instructors to write questions manually or can be passed as an input to automatic answer-aware question generators. Our experiments show that our proposed answer candidate generation model outperforms several baselines.

الحساب واللغة الذكاء الاصطناعي أجهزة الكمبيوتر والمجتمع

It is AIs Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset

120 - Bingsheng Yao , Dakuo Wang , Tongshuang Wu 2021

Existing question answering (QA) datasets are created mainly for the application of having AI to be able to answer questions asked by humans. But in educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results. With a newly released book QA dataset (FairytaleQA), which educational experts labeled on 46 fairytale storybooks for early childhood readers, we developed an automated QA generation model architecture for this novel application. Our model (1) extracts candidate answers from a given storybook passage through carefully designed heuristics based on a pedagogical framework; (2) generates appropriate questions corresponding to each extracted answer using a language model; and, (3) uses another QA model to rank top QA-pairs. Automatic and human evaluations show that our model outperforms baselines. We also demonstrate that our method can help with the scarcity issue of the childrens book QA dataset via data augmentation on 200 unlabeled storybooks.

الحساب واللغة الذكاء الاصطناعي

Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge

81 - Canwen Xu , Wangchunshu Zhou , Tao Ge 2021

Cant is important for understanding advertising, comedies and dog-whistle politics. However, computational research on cant is hindered by a lack of available datasets. In this paper, we propose a large and diverse Chinese dataset for creating and un derstanding cant from a computational linguistics perspective. We formulate a task for cant understanding and provide both quantitative and qualitative analysis for tested word embedding similarity and pretrained language models. Experiments suggest that such a task requires deep language understanding, common sense, and world knowledge and thus can be a good testbed for pretrained language models and help models perform better on other tasks. The code is available at https://github.com/JetRunner/dogwhistle. The data and leaderboard are available at https://competitions.codalab.org/competitions/30451.

الحساب واللغة الذكاء الاصطناعي

Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering

645 - Xiaoqiang Zhou , Baotian Hu , Qingcai Chen 2015

In this paper, the answer selection problem in community question answering (CQA) is regarded as an answer sequence labeling task, and a novel approach is proposed based on the recurrent architecture for this problem. Our approach applies convolution neural networks (CNNs) to learning the joint representation of question-answer pair firstly, and then uses the joint representation as input of the long short-term memory (LSTM) to learn the answer sequence of a question for labeling the matching quality of each answer. Experiments conducted on the SemEval 2015 CQA dataset shows the effectiveness of our approach.

الحساب واللغة استرجاع المعلومات التعلم الآلي