بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

211 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jiangnan Li

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jiangnan Li - Zheng Lin - Peng Fu

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.

قيم البحث

84 - Henghui Zhu , Feng Nan , Zhiguo Wang 2019

Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation struct ure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utterances to decide whether one utterance is the parent of the other. We believe the entire ancestral history is a very important information source to make accurate prediction. Therefore, we design a novel masking mechanism to guide the ancestor flow, and leverage the transformer model to aggregate all ancestors to predict parent utterances. Our experiments are performed on the Reddit dataset (Zhang, Culbertson, and Paritosh 2017) and the Ubuntu IRC dataset (Kummerfeld et al. 2019). In addition, we also report experiments on a new larger corpus from the Reddit platform and release this dataset. We show that the proposed model, that takes into account the ancestral history of the conversation, significantly outperforms several strong baselines including the BERT model on all datasets

الحساب واللغة

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

141 - Zaijing Li , Fengxiao Tang , Tieyu Sun 2021

For the task of conversation emotion recognition, recent works focus on speaker relationship modeling but ignore the role of utterances emotional tendency.In this paper, we propose a new expression paradigm of sentence-level emotion orientation vecto r to model the potential correlation of emotions between sentence vectors. Based on it, we design an emotion recognition model, which extracts the sentence-level emotion orientation vectors from the language model and jointly learns from the dialogue sentiment analysis model and extracted sentence-level emotion orientation vectors to identify the speakers emotional orientation during the conversation. We conduct experiments on two benchmark datasets and compare them with the five baseline models.The experimental results show that our model has better performance on all data sets.

الحساب واللغة الذكاء الاصطناعي

DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

108 - Weizhou Shen , Junqing Chen , Xiaojun Quan 2020

This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hiera rchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL.

الحساب واللغة الذكاء الاصطناعي استرجاع المعلومات

Multi-Task Learning with Auxiliary Speaker Identification for Conversational Emotion Recognition

92 - Jingye Li , Meishan Zhang , Donghong Ji 2020

Conversational emotion recognition (CER) has attracted increasing interests in the natural language processing (NLP) community. Different from the vanilla emotion recognition, effective speaker-sensitive utterance representation is one major challeng e for CER. In this paper, we exploit speaker identification (SI) as an auxiliary task to enhance the utterance representation in conversations. By this method, we can learn better speaker-aware contextual representations from the additional SI corpus. Experiments on two benchmark datasets demonstrate that the proposed architecture is highly effective for CER, obtaining new state-of-the-art results on two datasets.

الحساب واللغة أنظمة الصوت في الحاسوب معالجة الصوت والكلام

MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

143 - Jingwen Hu , Yuchen Liu , Jinming Zhao 2021

Emotion recognition in conversation (ERC) is a crucial component in affective dialogue systems, which helps the system understand users emotions and generate empathetic responses. However, most works focus on modeling speaker and contextual informati on primarily on the textual modality or simply leveraging multimodal information through feature concatenation. In order to explore a more effective way of utilizing both multimodal and long-distance contextual information, we propose a new model based on multimodal fused graph convolutional network, MMGCN, in this work. MMGCN can not only make use of multimodal dependencies effectively, but also leverage speaker information to model inter-speaker and intra-speaker dependency. We evaluate our proposed model on two public benchmark datasets, IEMOCAP and MELD, and the results prove the effectiveness of MMGCN, which outperforms other SOTA methods by a significant margin under the multimodal conversation setting.

الحساب واللغة أنظمة الصوت في الحاسوب معالجة الصوت والكلام

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الافتراضية السورية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً