Directed Acyclic Graph Network for Conversational Emotion Recognition

86 0 0.0 ( 0 )

Download Cite

Added by Weizhou Shen

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Weizhou Shen - Siyue Wu - Yunyi Yang

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The modeling of conversational context plays a vital role in emotion recognition from conversation (ERC). In this paper, we put forward a novel idea of encoding the utterances with a directed acyclic graph (DAG) to better model the intrinsic structure within a conversation, and design a directed acyclic neural network, namely DAG-ERC, to implement this idea. In an attempt to combine the strengths of conventional graph-based neural models and recurrence-based neural models, DAG-ERC provides a more intuitive way to model the information flow between long-distance conversation background and nearby context. Extensive experiments are conducted on four ERC benchmarks with state-of-the-art models employed as baselines for comparison. The empirical results demonstrate the superiority of this new model and confirm the motivation of the directed acyclic graph architecture for ERC.

rate research

Beyond Isolated Utterances: Conversational Emotion Recognition

155 - Raghavendra Pappagari , Piotr .Zelasko , Jesus Villalba 2021

Speech emotion recognition is the task of recognizing the speakers emotional state given a recording of their utterance. While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations. In this work, we propose several approaches for CER by treating it as a sequence labeling task. We investigated transformer architecture for CER and, compared it with ResNet-34 and BiLSTM architectures in both contextual and context-less scenarios using IEMOCAP corpus. Based on the inner workings of the self-attention mechanism, we proposed DiverseCatAugment (DCA), an augmentation scheme, which improved the transformer model performance by an absolute 3.3% micro-f1 on conversations and 3.6% on isolated utterances. We further enhanced the performance by introducing an interlocutor-aware transformer model where we learn a dictionary of interlocutor index embeddings to exploit diarized conversations.

Computation and Language Sound Audio and Speech Processing

Multi-Task Learning with Auxiliary Speaker Identification for Conversational Emotion Recognition

92 - Jingye Li , Meishan Zhang , Donghong Ji 2020

Conversational emotion recognition (CER) has attracted increasing interests in the natural language processing (NLP) community. Different from the vanilla emotion recognition, effective speaker-sensitive utterance representation is one major challenge for CER. In this paper, we exploit speaker identification (SI) as an auxiliary task to enhance the utterance representation in conversations. By this method, we can learn better speaker-aware contextual representations from the additional SI corpus. Experiments on two benchmark datasets demonstrate that the proposed architecture is highly effective for CER, obtaining new state-of-the-art results on two datasets.

Computation and Language Sound Audio and Speech Processing

MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

143 - Jingwen Hu , Yuchen Liu , Jinming Zhao 2021

Emotion recognition in conversation (ERC) is a crucial component in affective dialogue systems, which helps the system understand users emotions and generate empathetic responses. However, most works focus on modeling speaker and contextual information primarily on the textual modality or simply leveraging multimodal information through feature concatenation. In order to explore a more effective way of utilizing both multimodal and long-distance contextual information, we propose a new model based on multimodal fused graph convolutional network, MMGCN, in this work. MMGCN can not only make use of multimodal dependencies effectively, but also leverage speaker information to model inter-speaker and intra-speaker dependency. We evaluate our proposed model on two public benchmark datasets, IEMOCAP and MELD, and the results prove the effectiveness of MMGCN, which outperforms other SOTA methods by a significant margin under the multimodal conversation setting.

Computation and Language Sound Audio and Speech Processing

A Self-Attentive Emotion Recognition Network

191 - Harris Partaourides , Kostantinos Papadamou , Nicolas Kourtellis 2019

Modern deep learning approaches have achieved groundbreaking performance in modeling and classifying sequential data. Specifically, attention networks constitute the state-of-the-art paradigm for capturing long temporal dynamics. This paper examines the efficacy of this paradigm in the challenging task of emotion recognition in dyadic conversations. In contrast to existing approaches, our work introduces a novel attention mechanism capable of inferring the immensity of the effect of each past utterance on the current speaker emotional state. The proposed attention mechanism performs this inference procedure without the need of a decoder network; this is achieved by means of innovative self-attention arguments. Our self-attention networks capture the correlation patterns among consecutive encoder network states, thus allowing to robustly and effectively model temporal dynamics over arbitrary long temporal horizons. Thus, we enable capturing strong affective patterns over the course of long discussions. We exhibit the effectiveness of our approach considering the challenging IEMOCAP benchmark. As we show, our devised methodology outperforms state-of-the-art alternatives and commonly used approaches, giving rise to promising new research directions in the context of Online Social Network (OSN) analysis tasks.

Computation and Language Machine Learning Machine Learning

Physical qubit calibration on a directed acyclic graph

51 - Julian Kelly , Peter OMalley , Matthew Neeley 2018

High-fidelity control of qubits requires precisely tuned control parameters. Typically, these parameters are found through a series of bootstrapped calibration experiments which successively acquire more accurate information about a physical qubit. However, optimal parameters are typically different between devices and can also drift in time, which begets the need for an efficient calibration strategy. Here, we introduce a framework to understand the relationship between calibrations as a directed graph. With this approach, calibration is reduced to a graph traversal problem that is automatable and extensible.

Quantum Physics

comments

Fetching comments

Tartous University

Additional details More universities

Directed Acyclic Graph Network for Conversational Emotion Recognition

Ask ChatGPT about the research

No Arabic abstract

Read More