ﻻ يوجد ملخص باللغة العربية
The research on human emotion under multimedia stimulation based on physiological signals is an emerging field, and important progress has been achieved for emotion recognition based on multi-modal signals. However, it is challenging to make full use of the complementarity among spatial-spectral-temporal domain features for emotion recognition, as well as model the heterogeneity and correlation among multi-modal signals. In this paper, we propose a novel two-stream heterogeneous graph recurrent neural network, named HetEmotionNet, fusing multi-modal physiological signals for emotion recognition. Specifically, HetEmotionNet consists of the spatial-temporal stream and the spatial-spectral stream, which can fuse spatial-spectral-temporal domain features in a unified framework. Each stream is composed of the graph transformer network for modeling the heterogeneity, the graph convolutional network for modeling the correlation, and the gated recurrent unit for capturing the temporal domain or spectral domain dependency. Extensive experiments on two real-world datasets demonstrate that our proposed model achieves better performance than state-of-the-art baselines.
Graph neural networks for heterogeneous graph embedding is to project nodes into a low-dimensional space by exploring the heterogeneity and semantics of the heterogeneous graph. However, on the one hand, most of existing heterogeneous graph embedding
Graph neural networks are emerging as continuation of deep learning success w.r.t. graph data. Tens of different graph neural network variants have been proposed, most following a neighborhood aggregation scheme, where the node features are updated v
Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attenti
The data scarcity problem in Electroencephalography (EEG) based affective computing results into difficulty in building an effective model with high accuracy and stability using machine learning algorithms especially deep learning models. Data augmen
Automated emotion detection in speech is a challenging task due to the complex interdependence between words and the manner in which they are spoken. It is made more difficult by the available datasets; their small size and incompatible labeling idio