No Arabic abstract
Automatic question generation aims to generate questions from a text passage where the generated questions can be answered by certain sub-spans of the given passage. Traditional methods mainly use rigid heuristic rules to transform a sentence into related questions. In this work, we propose to apply the neural encoder-decoder model to generate meaningful and diverse questions from natural language sentences. The encoder reads the input text and the answer position, to produce an answer-aware input representation, which is fed to the decoder to generate an answer focused question. We conduct a preliminary study on neural question generation from text with the SQuAD dataset, and the experiment results show that our method can produce fluent and diverse questions.
Inquisitive probing questions come naturally to humans in a variety of settings, but is a challenging task for automatic systems. One natural type of question to ask tries to fill a gap in knowledge during text comprehension, like reading a news article: we might ask about background information, deeper reasons behind things occurring, or more. Despite recent progress with data-driven approaches, generating such questions is beyond the range of models trained on existing datasets. We introduce INQUISITIVE, a dataset of ~19K questions that are elicited while a person is reading through a document. Compared to existing datasets, INQUISITIVE questions target more towards high-level (semantic and discourse) comprehension of text. We show that readers engage in a series of pragmatic strategies to seek information. Finally, we evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions although the task is challenging, and highlight the importance of context to generate INQUISITIVE questions.
People ask questions that are far richer, more informative, and more creative than current AI systems. We propose a neuro-symbolic framework for modeling human question asking, which represents questions as formal programs and generates programs with an encoder-decoder based deep neural network. From extensive experiments using an information-search game, we show that our method can predict which questions humans are likely to ask in unconstrained settings. We also propose a novel grammar-based question generation framework trained with reinforcement learning, which is able to generate creative questions without supervised human data.
Submodularity is desirable for a variety of objectives in content selection where the current neural encoder-decoder framework is inadequate. However, it has so far not been explored in the neural encoder-decoder system for text generation. In this work, we define diminishing attentions with submodular functions and in turn, prove the submodularity of the effective neural coverage. The greedy algorithm approximating the solution to the submodular maximization problem is not suited to attention score optimization in auto-regressive generation. Therefore instead of following how submodular function has been widely used, we propose a simplified yet principled solution. The resulting attention module offers an architecturally simple and empirically effective method to improve the coverage of neural text generation. We run experiments on three directed text generation tasks with different levels of recovering rate, across two modalities, three different neural model architectures and two training strategy variations. The results and analyses demonstrate that our method generalizes well across these settings, produces texts of good quality and outperforms state-of-the-art baselines.
In this paper, we study automatic question generation, the task of creating questions from corresponding text passages where some certain spans of the text can serve as the answers. We propose an Extended Answer-aware Network (EAN) which is trained with Word-based Coverage Mechanism (WCM) and decodes with Uncertainty-aware Beam Search (UBS). The EAN represents the target answer by its surrounding sentence with an encoder, and incorporates the information of the extended answer into paragraph representation with gated paragraph-to-answer attention to tackle the problem of the inadequate representation of the target answer. To reduce undesirable repetition, the WCM penalizes repeatedly attending to the same words at different time-steps in the training stage. The UBS aims to seek a better balance between the model confidence in copying words from an input text paragraph and the confidence in generating words from a vocabulary. We conduct experiments on the SQuAD dataset, and the results show our approach achieves significant performance improvement.
Prior work on automated question generation has almost exclusively focused on generating simple questions whose answers can be extracted from a single document. However, there is an increasing interest in developing systems that are capable of more complex multi-hop question generation, where answering the questions requires reasoning over multiple documents. In this work, we introduce a series of strong transformer models for multi-hop question generation, including a graph-augmented transformer that leverages relations between entities in the text. While prior work has emphasized the importance of graph-based models, we show that we can substantially outperform the state-of-the-art by 5 BLEU points using a standard transformer architecture. We further demonstrate that graph-based augmentations can provide complimentary improvements on top of this foundation. Interestingly, we find that several important factors--such as the inclusion of an auxiliary contrastive objective and data filtering could have larger impacts on performance. We hope that our stronger baselines and analysis provide a constructive foundation for future work in this area.