Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents

95 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mohammad Kachuee Mr.

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mohammad Kachuee - Hao Yuan - Young-Bum Kim

التعلم الآلي الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Turn-level user satisfaction is one of the most important performance metrics for conversational agents. It can be used to monitor the agents performance and provide insights about defective user experiences. Moreover, a powerful satisfaction model can be used as an objective function that a conversational agent continuously optimizes for. While end-to-end deep learning has shown promising results, having access to a large number of reliable annotated samples required by these methods remains challenging. In a large-scale conversational system, there is a growing number of newly developed skills, making the traditional data collection, annotation, and modeling process impractical due to the required annotation costs as well as the turnaround times. In this paper, we suggest a self-supervised contrastive learning approach that leverages the pool of unlabeled data to learn user-agent interactions. We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction. In addition, we propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes. The suggested few-shot method does not require any inner loop optimization process and is scalable to very large datasets and complex models. Based on our experiments using real-world data from a large-scale commercial system, the suggested approach is able to significantly reduce the required number of annotations, while improving the generalization on unseen out-of-domain skills.

قيم البحث

121 - Dookun Park , Hao Yuan , Dongmin Kim 2020

Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and u se them for evaluation or modeling. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to collect users direct feedback via a feedback elicitation system embedded to the conversational agent system, and use the collected user feedback to train a machine-learned model for generalization. User feedback is the best proxy for user satisfaction, but is not available for some ineligible intents and certain situations. Thus, these two types of approaches are complementary to each other. In this work, we tackle the user satisfaction assessment problem with a hybrid approach that fuses explicit user feedback, user satisfaction predictions inferred by two machine-learned models, one trained on user feedback data and the other human annotation data. The hybrid approach is based on a waterfall policy, and the experimental results with Amazon Alexas large-scale datasets show significant improvements in inferring user satisfaction. A detailed hybrid architecture, an in-depth analysis on user feedback data, and an algorithm that generates data sets to properly simulate the live traffic are presented in this paper.

تفاعل الإنسان والحاسوب الذكاء الاصطناعي التعلم الآلي

User Personalized Satisfaction Prediction via Multiple Instance Deep Learning

122 - Zheqian Chen , Ben Gao , Huimin Zhang 2016

Community based question answering services have arisen as a popular knowledge sharing pattern for netizens. With abundant interactions among users, individuals are capable of obtaining satisfactory information. However, it is not effective for users to attain answers within minutes. Users have to check the progress over time until the satisfying answers submitted. We address this problem as a user personalized satisfaction prediction task. Existing methods usually exploit manual feature selection. It is not desirable as it requires careful design and is labor intensive. In this paper, we settle this issue by developing a new multiple instance deep learning framework. Specifically, in our settings, each question follows a weakly supervised learning multiple instance learning assumption, where its obtained answers can be regarded as instance sets and we define the question resolved with at least one satisfactory answer. We thus design an efficient framework exploiting multiple instance learning property with deep learning to model the question answer pairs. Extensive experiments on large scale datasets from Stack Exchange demonstrate the feasibility of our proposed framework in predicting askers personalized satisfaction. Our framework can be extended to numerous applications such as UI satisfaction Prediction, multi armed bandit problem, expert finding and so on.

استرجاع المعلومات الحساب واللغة

Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

91 - Monica Sunkara , Srikanth Ronanki , Dhanush Bekal 2020

In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced ali gnment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representations. As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment based fusion. Experiments conducted on the Fisher corpus show that our proposed approach achieves ~6-9% and ~3-4% absolute improvement (F1 score) over the baseline BLSTM model on reference transcripts and ASR outputs respectively. We further improve the model robustness to ASR errors by performing data augmentation with N-best lists which achieves up to an additional ~2-6% improvement on ASR outputs. We also demonstrate the effectiveness of semi-supervised learning approach by performing ablation study on various sizes of the corpus. When trained on 1 hour of speech and text data, the proposed model achieved ~9-18% absolute improvement over baseline model.

معالجة الصوت والكلام الحساب واللغة

Contrastive Self-supervised Learning for Graph Classification

114 - Jiaqi Zeng , Pengtao Xie 2020

Graph classification is a widely studied problem and has broad applications. In many real-world problems, the number of labeled graphs available for training classification models is limited, which renders these models prone to overfitting. To addres s this problem, we propose two approaches based on contrastive self-supervised learning (CSSL) to alleviate overfitting. In the first approach, we use CSSL to pretrain graph encoders on widely-available unlabeled graphs without relying on human-provided labels, then finetune the pretrained encoders on labeled graphs. In the second approach, we develop a regularizer based on CSSL, and solve the supervised classification task and the unsupervised CSSL task simultaneously. To perform CSSL on graphs, given a collection of original graphs, we perform data augmentation to create augmented graphs out of the original graphs. An augmented graph is created by consecutively applying a sequence of graph alteration operations. A contrastive loss is defined to learn graph encoders by judging whether two augmented graphs are from the same original graph. Experiments on various graph classification datasets demonstrate the effectiveness of our proposed methods.

التعلم الآلي التعلم الالي

Group Contrastive Self-Supervised Learning on Graphs

199 - Xinyi Xu , Cheng Deng , Yaochen Xie 2021

We study self-supervised learning on graphs using contrastive methods. A general scheme of prior methods is to optimize two-view representations of input graphs. In many studies, a single graph-level representation is computed as one of the contrasti ve objectives, capturing limited characteristics of graphs. We argue that contrasting graphs in multiple subspaces enables graph encoders to capture more abundant characteristics. To this end, we propose a group contrastive learning framework in this work. Our framework embeds the given graph into multiple subspaces, of which each representation is prompted to encode specific characteristics of graphs. To learn diverse and informative representations, we develop principled objectives that enable us to capture the relations among both intra-space and inter-space representations in groups. Under the proposed framework, we further develop an attention-based representor function to compute representations that capture different substructures of a given graph. Built upon our framework, we extend two current methods into GroupCL and GroupIG, equipped with the proposed objective. Comprehensive experimental results show our framework achieves a promising boost in performance on a variety of datasets. In addition, our qualitative results show that features generated from our representor successfully capture various specific characteristics of graphs.

التعلم الآلي