No Arabic abstract
The data scarcity problem in Electroencephalography (EEG) based affective computing results into difficulty in building an effective model with high accuracy and stability using machine learning algorithms especially deep learning models. Data augmentation has recently achieved considerable performance improvement for deep learning models: increased accuracy, stability, and reduced over-fitting. In this paper, we propose a novel data augmentation framework, namely Generative Adversarial Network-based Self-supervised Data Augmentation (GANSER). As the first to combine adversarial training with self-supervised learning for EEG-based emotion recognition, the proposed framework can generate high-quality and high-diversity simulated EEG samples. In particular, we utilize adversarial training to learn an EEG generator and force the generated EEG signals to approximate the distribution of real samples, ensuring the quality of augmented samples. A transformation function is employed to mask parts of EEG signals and force the generator to synthesize potential EEG signals based on the remaining parts, to produce a wide variety of samples. The masking possibility during transformation is introduced as prior knowledge to guide to extract distinguishable features for simulated EEG signals and generalize the classifier to the augmented sample space. Finally, extensive experiments demonstrate our proposed method can help emotion recognition for performance gain and achieve state-of-the-art results.
Machine learning methods, such as deep learning, show promising results in the medical domain. However, the lack of interpretability of these algorithms may hinder their applicability to medical decision support systems. This paper studies an interpretable deep learning technique, called SincNet. SincNet is a convolutional neural network that efficiently learns customized band-pass filters through trainable sinc-functions. In this study, we use SincNet to analyze the neural activity of individuals with Autism Spectrum Disorder (ASD), who experience characteristic differences in neural oscillatory activity. In particular, we propose a novel SincNet-based neural network for detecting emotions in ASD patients using EEG signals. The learned filters can be easily inspected to detect which part of the EEG spectrum is used for predicting emotions. We found that our system automatically learns the high-$alpha$ (9-13 Hz) and $beta$ (13-30 Hz) band suppression often present in individuals with ASD. This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD. The improved interpretability of SincNet is achieved without sacrificing performance in emotion recognition.
EEG signals are usually simple to obtain but expensive to label. Although supervised learning has been widely used in the field of EEG signal analysis, its generalization performance is limited by the amount of annotated data. Self-supervised learning (SSL), as a popular learning paradigm in computer vision (CV) and natural language processing (NLP), can employ unlabeled data to make up for the data shortage of supervised learning. In this paper, we propose a self-supervised contrastive learning method of EEG signals for sleep stage classification. During the training process, we set up a pretext task for the network in order to match the right transformation pairs generated from EEG signals. In this way, the network improves the representation ability by learning the general features of EEG signals. The robustness of the network also gets improved in dealing with diverse data, that is, extracting constant features from changing data. In detail, the networks performance depends on the choice of transformations and the amount of unlabeled data used in the training process of self-supervised learning. Empirical evaluations on the Sleep-edf dataset demonstrate the competitive performance of our method on sleep staging (88.16% accuracy and 81.96% F1 score) and verify the effectiveness of SSL strategy for EEG signal analysis in limited labeled data regimes. All codes are provided publicly online.
The cross-subject application of EEG-based brain-computer interface (BCI) has always been limited by large individual difference and complex characteristics that are difficult to perceive. Therefore, it takes a long time to collect the training data of each user for calibration. Even transfer learning method pre-training with amounts of subject-independent data cannot decode different EEG signal categories without enough subject-specific data. Hence, we proposed a cross-subject EEG classification framework with a generative adversarial networks (GANs) based method named common spatial GAN (CS-GAN), which used adversarial training between a generator and a discriminator to obtain high-quality data for augmentation. A particular module in the discriminator was employed to maintain the spatial features of the EEG signals and increase the difference between different categories, with two losses for further enhancement. Through adaptive training with sufficient augmentation data, our cross-subject classification accuracy yielded a significant improvement of 15.85% than leave-one subject-out (LOO) test and 8.57% than just adapting 100 original samples on the dataset 2a of BCI competition IV. Moreover, We designed a convolutional neural networks (CNNs) based classification method as a benchmark with a similar spatial enhancement idea, which achieved remarkable results to classify motor imagery EEG data. In summary, our framework provides a promising way to deal with the cross-subject problem and promote the practical application of BCI.
The research on human emotion under multimedia stimulation based on physiological signals is an emerging field, and important progress has been achieved for emotion recognition based on multi-modal signals. However, it is challenging to make full use of the complementarity among spatial-spectral-temporal domain features for emotion recognition, as well as model the heterogeneity and correlation among multi-modal signals. In this paper, we propose a novel two-stream heterogeneous graph recurrent neural network, named HetEmotionNet, fusing multi-modal physiological signals for emotion recognition. Specifically, HetEmotionNet consists of the spatial-temporal stream and the spatial-spectral stream, which can fuse spatial-spectral-temporal domain features in a unified framework. Each stream is composed of the graph transformer network for modeling the heterogeneity, the graph convolutional network for modeling the correlation, and the gated recurrent unit for capturing the temporal domain or spectral domain dependency. Extensive experiments on two real-world datasets demonstrate that our proposed model achieves better performance than state-of-the-art baselines.
Emotion recognition based on EEG has become an active research area. As one of the machine learning models, CNN has been utilized to solve diverse problems including issues in this domain. In this work, a study of CNN and its spatiotemporal feature extraction has been conducted in order to explore capabilities of the model in varied window sizes and electrode orders. Our investigation was conducted in subject-independent fashion. Results have shown that temporal information in distinct window sizes significantly affects recognition performance in both 10-fold and leave-one-subject-out cross validation. Spatial information from varying electrode order has modicum effect on classification. SVM classifier depending on spatiotemporal knowledge on the same dataset was previously employed and compared to these empirical results. Even though CNN and SVM have a homologous trend in window size effect, CNN outperformed SVM using leave-one-subject-out cross validation. This could be caused by different extracted features in the elicitation process.