ترغب بنشر مسار تعليمي؟ اضغط هنا

Challenges in Generalization in Open Domain Question Answering

138   0   0.0 ( 0 )
 نشر من قبل Linqing Liu
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Recent work on Open Domain Question Answering has shown that there is a large discrepancy in model performance between novel test questions and those that largely overlap with training questions. However, it is as of yet unclear which aspects of novel questions that make them challenging. Drawing upon studies on systematic generalization, we introduce and annotate questions according to three categories that measure different levels and kinds of generalization: training set overlap, compositional generalization (comp-gen), and novel entity generalization (novel-entity). When evaluating six popular parametric and non-parametric models, we find that for the established Natural Questions and TriviaQA datasets, even the strongest model performance for comp-gen/novel-entity is 13.1/5.4% and 9.6/1.5% lower compared to that for the full test set -- indicating the challenge posed by these types of questions. Furthermore, we show that whilst non-parametric models can handle questions containing novel entities, they struggle with those requiring compositional generalization. Through thorough analysis we find that key question difficulty factors are: cascading errors from the retrieval component, frequency of question pattern, and frequency of the entity.

قيم البحث

اقرأ أيضاً

Open-domain Question Answering (ODQA) has achieved significant results in terms of supervised learning manner. However, data annotation cannot also be irresistible for its huge demand in an open domain. Though unsupervised QA or unsupervised Machine Reading Comprehension (MRC) has been tried more or less, unsupervised ODQA has not been touched according to our best knowledge. This paper thus pioneers the work of unsupervised ODQA by formally introducing the task and proposing a series of key data construction methods. Our exploration in this work inspiringly shows unsupervised ODQA can reach up to 86% performance of supervised ones.
To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively. In this paper, we study a hybrid approach for leveraging the strengths of both models. We apply novel techniques to enhance both extractive and generative readers built upon recent pretrained neural language models, and find that proper training methods can provide large improvement over previous state-of-the-art models. We demonstrate that a simple hybrid approach by combining answers from both readers can efficiently take advantages of extractive and generative answer inference strategies and outperforms single models as well as homogeneous ensembles. Our approach outperforms previous state-of-the-art models by 3.3 and 2.7 points in exact match on NaturalQuestions and TriviaQA respectively.
This paper proposes a new problem of complementary evidence identification for open-domain question answering (QA). The problem aims to efficiently find a small set of passages that covers full evidence from multiple aspects as to answer a complex qu estion. To this end, we proposes a method that learns vector representations of passages and models the sufficiency and diversity within the selected set, in addition to the relevance between the question and passages. Our experiments demonstrate that our method considers the dependence within the supporting evidence and significantly improves the accuracy of complementary evidence selection in QA domain.
Open-domain question answering (QA) aims to find the answer to a question from a large collection of documents.Though many models for single-document machine comprehension have achieved strong performance, there is still much room for improving open- domain QA systems since document retrieval and answer reranking are still unsatisfactory. Golden documents that contain the correct answers may not be correctly scored by the retrieval component, and the correct answers that have been extracted may be wrongly ranked after other candidate answers by the reranking component. One of the reasons is derived from the independent principle in which each candidate document (or answer) is scored independently without considering its relationship to other documents (or answers). In this work, we propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and candidate answer reranking by considering the relationship between a question and the documents (termed as question-document graph), and the relationship between candidate documents (termed as document-document graph). The graphs are built using knowledge triples from external knowledge resources. During document retrieval, a candidate document is scored by considering its relationship to the question and other documents. During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents. The experimental results show that our proposed method improves document retrieval and answer reranking, and thereby enhances the overall performance of open-domain question answering.
Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer. In this paper, we propose a simple and effective passage reranking method, named Reader-guIDEd Reranker (RIDER), which does not involve training and reranks the retrieved passages solely based on the top predictions of the reader before reranking. We show that RIDER, despite its simplicity, achieves 10 to 20 absolute gains in top-1 retrieval accuracy and 1 to 4 Exact Match (EM) gains without refining the retriever or reader. In addition, RIDER, without any training, outperforms state-of-the-art transformer-based supervised rerankers. Remarkably, RIDER achieves 48.3 EM on the Natural Questions dataset and 66.4 EM on the TriviaQA dataset when only 1,024 tokens (7.8 passages on average) are used as the reader input after passage reranking.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا