Knowledge Injection into Dialogue Generation via Language Models

382 0 0.0 ( 0 )

Download Cite

Added by Yi-Lin Tuan

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Yi-Lin Tuan - Wei Wei - William Yang Wang

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Dialogue generation has been successfully learned from scratch by neural networks, but tends to produce the same general response, e.g., what are you talking about?, in many conversations. To reduce this homogeneity, external knowledge such as the speakers profile and domain knowledge is applied as an additional condition to diversify a models output. The required knowledge to develop an effective conversation, however, is not always available, which is different from prior works assumption that a model always has acquired sufficient knowledge before chatting. This problem can be detrimental when applying a dialogue model like this chatting online with unconstrained people and topics, because the model does not have the needed knowledge. To address this problem, we propose InjK, which is a two-stage approach to inject knowledge into a dialogue generation model. First, we train a large-scale language model and query it as textual knowledge. Second, we frame a dialogue generation model to sequentially generate textual knowledge and a corresponding response. Empirically, when a dialogue generation model can only access limited knowledge, our method outperforms prior work by producing more coherent and informative responses.

rate research

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

127 - Jian Wang , Junhao Liu , Wei Bi 2019

Neural network models usually suffer from the challenge of incorporating commonsense knowledge into the open-domain dialogue systems. In this paper, we propose a novel knowledge-aware dialogue generation model (called TransDG), which transfers question representation and knowledge matching abilities from knowledge base question answering (KBQA) task to facilitate the utterance understanding and factual knowledge selection for dialogue generation. In addition, we propose a response guiding attention and a multi-step decoding strategy to steer our model to focus on relevant features for response generation. Experiments on two benchmark datasets demonstrate that our model has robust superiority over compared methods in generating informative and fluent dialogues. Our code is available at https://github.com/siat-nlp/TransDG.

Computation and Language

Integrating Graph Contextualized Knowledge into Pre-trained Language Models

304 - Bin He , Di Zhou , Jinghui Xiao 2019

Complex node interactions are common in knowledge graphs, and these interactions also contain rich knowledge information. However, traditional methods usually treat a triple as a training unit during the knowledge representation learning (KRL) procedure, neglecting contextualized information of the nodes in knowledge graphs (KGs). We generalize the modeling object to a very general form, which theoretically supports any subgraph extracted from the knowledge graph, and these subgraphs are fed into a novel transformer-based model to learn the knowledge embeddings. To broaden usage scenarios of knowledge, pre-trained language models are utilized to build a model that incorporates the learned knowledge representations. Experimental results demonstrate that our model achieves the state-of-the-art performance on several medical NLP tasks, and improvement above TransE indicates that our KRL method captures the graph contextualized information effectively.

Computation and Language Artificial Intelligence

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models

138 - Junyi Li , Tianyi Tang , Wayne Xin Zhao 2021

This paper studies how to automatically generate a natural language text that describes the facts in knowledge graph (KG). Considering the few-shot setting, we leverage the excellent capacities of pretrained language models (PLMs) in language understanding and generation. We make three major technical contributions, namely representation alignment for bridging the semantic gap between KG encodings and PLMs, relation-biased KG linearization for deriving better input representations, and multi-task learning for learning the correspondence between KG and text. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our model on KG-to-text generation task. In particular, our model outperforms all comparison methods on both fully-supervised and few-shot settings. Our code and datasets are available at https://github.com/RUCAIBox/Few-Shot-KG2Text.

Computation and Language

An Enhanced Knowledge Injection Model for Commonsense Generation

106 - Zhihao Fan , Yeyun Gong , Zhongyu Wei 2020

Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts. Digging the relationship of concepts from scratch is non-trivial, therefore, we retrieve prototypes from external knowledge to assist the understanding of the scenario for better description generation. We integrate two additional modules, namely position indicator and scaling module, into the pretrained encoder-decoder model for prototype modeling to enhance the knowledge injection procedure. We conduct experiment on CommonGen benchmark, and experimental results show that our method significantly improves the performance on all the metrics.

Computation and Language

Task-Oriented Dialogue System as Natural Language Generation

247 - Weizhi Wang , Zhirui Zhang , Junliang Guo 2021

In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. However, directly applying this method heavily suffers from the dialogue entity inconsistency caused by the removal of delexicalized tokens, as well as the catastrophic forgetting problem of the pre-trained model during fine-tuning, leading to unsatisfactory performance. To alleviate these problems, we design a novel GPT-Adapter-CopyNet network, which incorporates the lightweight adapter and CopyNet modules into GPT-2 to achieve better performance on transfer learning and dialogue entity generation. Experimental results conducted on the DSTC8 Track 1 benchmark and MultiWOZ dataset demonstrate that our proposed approach significantly outperforms baseline models with a remarkable performance on automatic and human evaluations.

Computation and Language Artificial Intelligence