ﻻ يوجد ملخص باللغة العربية
We study calibration in question answering, estimating whether model correctly predicts answer for each question. Unlike prior work which mainly rely on the models confidence score, our calibrator incorporates information about the input example (e.g., question and the evidence context). Together with data augmentation via back translation, our simple approach achieves 5-10% gains in calibration accuracy on reading comprehension benchmarks. Furthermore, we present the first calibration study in the open retrieval setting, comparing the calibration accuracy of retrieval-based span prediction models and answer generation models. Here again, our approach shows consistent gains over calibrators relying on the model confidence. Our simple and efficient calibrator can be easily adapted to many tasks and model architectures, showing robust gains in all settings.
Is it possible to develop an AI Pathologist to pass the board-certified examination of the American Board of Pathology? To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a
A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet in
A question answering (QA) system is a type of conversational AI that generates natural language answers to questions posed by human users. QA systems often form the backbone of interactive dialogue systems, and have been studied extensively for a wid
While recent models have achieved human-level scores on many NLP datasets, we observe that they are considerably sensitive to small changes in input. As an alternative to the standard approach of addressing this issue by constructing training sets of
We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective met