ﻻ يوجد ملخص باللغة العربية
Feature subspace selection is an important part in speech emotion recognition. Most of the studies are devoted to finding a feature subspace for representing all emotions. However, some studies have indicated that the features associated with different emotions are not exactly the same. Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace. In this work, we propose a new divide and conquer idea to solve the problem. First, the feature subspaces are constructed for all the combinations of every two different emotions (emotion-pair). Bi-classifiers are then trained on these feature subspaces respectively. The final emotion recognition result is derived by the voting and competition method. Experimental results demonstrate that the proposed method can get better results than the traditional multi-classification method.
We examine the use of linear and non-linear dimensionality reduction algorithms for extracting low-rank feature representations for speech emotion recognition. Two feature sets are used, one based on low-level descriptors and their aggregations (IS10
Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dy
Speech emotion recognition is a vital contributor to the next generation of human-computer interaction (HCI). However, current existing small-scale databases have limited the development of related research. In this paper, we present LSSED, a challen
We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER). Reconstruction of the phase space of each speech frame and the computation of it
This paper introduces scattering transform for speech emotion recognition (SER). Scattering transform generates feature representations which remain stable to deformations and shifting in time and frequency without much loss of information. In speech