ﻻ يوجد ملخص باللغة العربية
Dialogue Act (DA) classification is the task of classifying utterances with respect to the function they serve in a dialogue. Existing approaches to DA classification model utterances without incorporating the turn changes among speakers throughout the dialogue, therefore treating it no different than non-interactive written text. In this paper, we propose to integrate the turn changes in conversations among speakers when modeling DAs. Specifically, we learn conversation-invariant speaker turn embeddings to represent the speaker turns in a conversation; the learned speaker turn embeddings are then merged with the utterance embeddings for the downstream task of DA classification. With this simple yet effective mechanism, our model is able to capture the semantics from the dialogue content while accounting for different speaker turns in a conversation. Validation on three benchmark public datasets demonstrates superior performance of our model.
Topic drift is a common phenomenon in multi-turn dialogue. Therefore, an ideal dialogue generation models should be able to capture the topic information of each context, detect the relevant context, and produce appropriate responses accordingly. How
In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speakers utterances from a monaural mixture of multiple speakers speech given a
Multi-party multi-turn dialogue comprehension brings unprecedented challenges on handling the complicated scenarios from multiple speakers and criss-crossed discourse relationship among speaker-aware utterances. Most existing methods deal with dialog
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. Thus utterance- and speaker-aware clues are supposed to be well captured in models. However, in the existing retrieval-based multi-turn dialogue modeli
Conversational emotion recognition (CER) has attracted increasing interests in the natural language processing (NLP) community. Different from the vanilla emotion recognition, effective speaker-sensitive utterance representation is one major challeng