Do you want to publish a course? Click here

Asking questions about a situation is an inherent step towards understanding it. To this end, we introduce the task of role question generation, which, given a predicate mention and a passage, requires producing a set of questions asking about all po ssible semantic roles of the predicate. We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage. Unlike most existing approaches to question generation, our approach does not require conditioning on existing answers in the text. Instead, we condition on the type of information to inquire about, regardless of whether the answer appears explicitly in the text, could be inferred from it, or should be sought elsewhere. Our evaluation demonstrates that we generate diverse and well-formed questions for a large, broad-coverage ontology of predicates and roles.
Multi-hop QA requires the machine to answer complex questions through finding multiple clues and reasoning, and provide explanatory evidence to demonstrate the machine's reasoning process. We propose Relation Extractor-Reader and Comparator (RERC), a three-stage framework based on complex question decomposition. The Relation Extractor decomposes the complex question, and then the Reader answers the sub-questions in turn, and finally the Comparator performs numerical comparison and summarizes all to get the final answer, where the entire process itself constitutes a complete reasoning evidence path. In the 2WikiMultiHopQA dataset, our RERC model has achieved the state-of-the-art performance, with a winning joint F1 score of 53.58 on the leaderboard. All indicators of our RERC are close to human performance, with only 1.95 behind the human level in F1 score of support fact. At the same time, the evidence path provided by our RERC framework has excellent readability and faithfulness.
We develop a unified system to answer directly from text open-domain questions that may require a varying number of retrieval steps. We employ a single multi-task transformer model to perform all the necessary subtasks---retrieving supporting facts, reranking them, and predicting the answer from all retrieved documents---in an iterative fashion. We avoid crucial assumptions of previous work that do not transfer well to real-world settings, including exploiting knowledge of the fixed number of retrieval steps required to answer each question or using structured metadata like knowledge bases or web links that have limited availability. Instead, we design a system that can answer open-domain questions on any text collection without prior knowledge of reasoning complexity. To emulate this setting, we construct a new benchmark, called BeerQA, by combining existing one- and two-step datasets with a new collection of 530 questions that require three Wikipedia pages to answer, unifying Wikipedia corpora versions in the process. We show that our model demonstrates competitive performance on both existing benchmarks and this new benchmark. We make the new benchmark available at https://beerqa.github.io/.
The key challenge of question answering over knowledge bases (KBQA) is the inconsistency between the natural language questions and the reasoning paths in the knowledge base (KB). Recent graph-based KBQA methods are good at grasping the topological s tructure of the graph but often ignore the textual information carried by the nodes and edges. Meanwhile, pre-trained language models learn massive open-world knowledge from the large corpus, but it is in the natural language form and not structured. To bridge the gap between the natural language and the structured KB, we propose three relation learning tasks for BERT-based KBQA, including relation extraction, relation matching, and relation reasoning. By relation-augmented training, the model learns to align the natural language expressions to the relations in the KB as well as reason over the missing connections in the KB. Experiments on WebQSP show that our method consistently outperforms other baselines, especially when the KB is incomplete.
Scenario-based question answering (SQA) requires retrieving and reading paragraphs from a large corpus to answer a question which is contextualized by a long scenario description. Since a scenario contains both keyphrases for retrieval and much noise , retrieval for SQA is extremely difficult. Moreover, it can hardly be supervised due to the lack of relevance labels of paragraphs for SQA. To meet the challenge, in this paper we propose a joint retriever-reader model called JEEVES where the retriever is implicitly supervised only using QA labels via a novel word weighting mechanism. JEEVES significantly outperforms a variety of strong baselines on multiple-choice questions in three SQA datasets.
In this paper, we show that automatically-generated questions and answers can be used to evaluate the quality of Machine Translation (MT) systems. Building on recent work on the evaluation of abstractive text summarization, we propose a new metric for system-level MT evaluation, compare it with other state-of-the-art solutions, and show its robustness by conducting experiments for various MT directions.
Understanding when a text snippet does not provide a sought after information is an essential part of natural language utnderstanding. Recent work (SQuAD 2.0; Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, non-competitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE; Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.
Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples. However, in this paper, we demonstrate current dense mo dels are not yet the holy grail of retrieval. We first construct EntityQuestions, a set of simple, entity-rich questions based on facts from Wikidata (e.g., Where was Arve Furset born?''), and observe that dense retrievers drastically under-perform sparse methods. We investigate this issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training. We discuss two simple solutions towards addressing this critical problem. First, we demonstrate that data augmentation is unable to fix the generalization problem. Second, we argue a more robust passage encoder helps facilitate better question adaptation using specialized question encoders. We hope our work can shed light on the challenges in creating a robust, universal dense retriever that works well across different input distributions.
Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response. Namely, for cases when a user request is not specific enough for a conversation system to p rovide an answer right away, it is desirable to ask a clarifying question to increase the chances of retrieving a satisfying answer. To address the problem of asking clarifying questions in open-domain dialogues': (1) we collect and release a new dataset focused on open-domain single- and multi-turn conversations, (2) we benchmark several state-of-the-art neural baselines, and (3) we propose a pipeline consisting of offline and online steps for evaluating the quality of clarifying questions in various dialogues. These contributions are suitable as a foundation for further research.
In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked k nowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا