No Arabic abstract
Modern task-oriented dialog systems need to reliably understand users intents. Intent detection is most challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models. Our models are able to predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75% accuracy using only 100 training examples in all datasets.
Unknown intent detection aims to identify the out-of-distribution (OOD) utterance whose intent has never appeared in the training set. In this paper, we propose using energy scores for this task as the energy score is theoretically aligned with the density of the input and can be derived from any classifier. However, high-quality OOD utterances are required during the training stage in order to shape the energy gap between OOD and in-distribution (IND), and these utterances are difficult to collect in practice. To tackle this problem, we propose a data manipulation framework to Generate high-quality OOD utterances with importance weighTs (GOT). Experimental results show that the energy-based detector fine-tuned by GOT can achieve state-of-the-art results on two benchmark datasets.
Zero-shot intent detection (ZSID) aims to deal with the continuously emerging intents without annotated training data. However, existing ZSID systems suffer from two limitations: 1) They are not good at modeling the relationship between seen and unseen intents. 2) They cannot effectively recognize unseen intents under the generalized intent detection (GZSID) setting. A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage. To address this problem, we propose a novel framework that utilizes unseen class labels to learn Class-Transductive Intent Representations (CTIR). Specifically, we allow the model to predict unseen intents during training, with the corresponding label names serving as input utterances. On this basis, we introduce a multi-task learning objective, which encourages the model to learn the distinctions among intents, and a similarity scorer, which estimates the connections among intents more accurately. CTIR is easy to implement and can be integrated with existing methods. Experiments on two real-world datasets show that CTIR brings considerable improvement to the baseline systems.
We propose a suite of reasoning tasks on two types of relations between procedural events: goal-step relations (learn poses is a step in the larger goal of doing yoga) and step-step temporal relations (buy a yoga mat typically precedes learn poses). We introduce a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles. Our human-validated test set serves as a reliable benchmark for commonsense inference, with a gap of about 10% to 20% between the performance of state-of-the-art transformer models and human performance. Our automatically-generated training set allows models to effectively transfer to out-of-domain tasks requiring knowledge of procedural events, with greatly improved performances on SWAG, Snips, and the Story Cloze Test in zero- and few-shot settings.
Conventional Intent Detection (ID) models are usually trained offline, which relies on a fixed dataset and a predefined set of intent classes. However, in real-world applications, online systems usually involve continually emerging new user intents, which pose a great challenge to the offline training paradigm. Recently, lifelong learning has received increasing attention and is considered to be the most promising solution to this challenge. In this paper, we propose Lifelong Intent Detection (LID), which continually trains an ID model on new data to learn newly emerging intents while avoiding catastrophically forgetting old data. Nevertheless, we find that existing lifelong learning methods usually suffer from a serious imbalance between old and new data in the LID task. Therefore, we propose a novel lifelong learning method, Multi-Strategy Rebalancing (MSR), which consists of cosine normalization, hierarchical knowledge distillation, and inter-class margin loss to alleviate the multiple negative effects of the imbalance problem. Experimental results demonstrate the effectiveness of our method, which significantly outperforms previous state-of-the-art lifelong learning methods on the ATIS, SNIPS, HWU64, and CLINC150 benchmarks.
Intent detection and slot filling are two fundamental tasks for building a spoken language understanding (SLU) system. Multiple deep learning-based joint models have demonstrated excellent results on the two tasks. In this paper, we propose a new joint model with a wheel-graph attention network (Wheel-GAT) which is able to model interrelated connections directly for intent detection and slot filling. To construct a graph structure for utterances, we create intent nodes, slot nodes, and directed edges. Intent nodes can provide utterance-level semantic information for slot filling, while slot nodes can also provide local keyword information for intent. Experiments show that our model outperforms multiple baselines on two public datasets. Besides, we also demonstrate that using Bidirectional Encoder Representation from Transformer (BERT) model further boosts the performance in the SLU task.