No Arabic abstract
Machine learning methods, such as deep learning, show promising results in the medical domain. However, the lack of interpretability of these algorithms may hinder their applicability to medical decision support systems. This paper studies an interpretable deep learning technique, called SincNet. SincNet is a convolutional neural network that efficiently learns customized band-pass filters through trainable sinc-functions. In this study, we use SincNet to analyze the neural activity of individuals with Autism Spectrum Disorder (ASD), who experience characteristic differences in neural oscillatory activity. In particular, we propose a novel SincNet-based neural network for detecting emotions in ASD patients using EEG signals. The learned filters can be easily inspected to detect which part of the EEG spectrum is used for predicting emotions. We found that our system automatically learns the high-$alpha$ (9-13 Hz) and $beta$ (13-30 Hz) band suppression often present in individuals with ASD. This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD. The improved interpretability of SincNet is achieved without sacrificing performance in emotion recognition.
The data scarcity problem in Electroencephalography (EEG) based affective computing results into difficulty in building an effective model with high accuracy and stability using machine learning algorithms especially deep learning models. Data augmentation has recently achieved considerable performance improvement for deep learning models: increased accuracy, stability, and reduced over-fitting. In this paper, we propose a novel data augmentation framework, namely Generative Adversarial Network-based Self-supervised Data Augmentation (GANSER). As the first to combine adversarial training with self-supervised learning for EEG-based emotion recognition, the proposed framework can generate high-quality and high-diversity simulated EEG samples. In particular, we utilize adversarial training to learn an EEG generator and force the generated EEG signals to approximate the distribution of real samples, ensuring the quality of augmented samples. A transformation function is employed to mask parts of EEG signals and force the generator to synthesize potential EEG signals based on the remaining parts, to produce a wide variety of samples. The masking possibility during transformation is introduced as prior knowledge to guide to extract distinguishable features for simulated EEG signals and generalize the classifier to the augmented sample space. Finally, extensive experiments demonstrate our proposed method can help emotion recognition for performance gain and achieve state-of-the-art results.
In the context of electroencephalogram (EEG)-based driver drowsiness recognition, it is still a challenging task to design a calibration-free system, since there exists a significant variability of EEG signals among different subjects and recording sessions. As deep learning has received much research attention in recent years, many efforts have been made to use deep learning methods for EEG signal recognition. However, existing works mostly treat deep learning models as blackbox classifiers, while what have been learned by the models and to which extent they are affected by the noise from EEG data are still underexplored. In this paper, we develop a novel convolutional neural network that can explain its decision by highlighting the local areas of the input sample that contain important information for the classification. The network has a compact structure for ease of interpretation and takes advantage of separable convolutions to process the EEG signals in a spatial-temporal sequence. Results show that the model achieves an average accuracy of 78.35% on 11 subjects for leave-one-out cross-subject drowsiness recognition, which is higher than the conventional baseline methods of 53.4%-72.68% and state-of-art deep learning methods of 63.90%-65.61%. Visualization results show that the model has learned to recognize biologically explainable features from EEG signals, e.g., Alpha spindles, as strong indicators of drowsiness across different subjects. In addition, we also explore reasons behind some wrongly classified samples and how the model is affected by artifacts and noise in the data. Our work illustrates a promising direction on using interpretable deep learning models to discover meaning patterns related to different mental states from complex EEG signals.
At present, people usually use some methods based on convolutional neural networks (CNNs) for Electroencephalograph (EEG) decoding. However, CNNs have limitations in perceiving global dependencies, which is not adequate for common EEG paradigms with a strong overall relationship. Regarding this issue, we propose a novel EEG decoding method that mainly relies on the attention mechanism. The EEG data is firstly preprocessed and spatially filtered. And then, we apply attention transforming on the feature-channel dimension so that the model can enhance more relevant spatial features. The most crucial step is to slice the data in the time dimension for attention transforming, and finally obtain a highly distinguishable representation. At this time, global averaging pooling and a simple fully-connected layer are used to classify different categories of EEG data. Experiments on two public datasets indicate that the strategy of attention transforming effectively utilizes spatial and temporal features. And we have reached the level of the state-of-the-art in multi-classification of EEG, with fewer parameters. As far as we know, it is the first time that a detailed and complete method based on the transformer idea has been proposed in this field. It has good potential to promote the practicality of brain-computer interface (BCI). The source code can be found at: textit{https://github.com/anranknight/EEG-Transformer}.
Humans are emotional creatures. Multiple modalities are often involved when we express emotions, whether we do so explicitly (e.g., facial expression, speech) or implicitly (e.g., text, image). Enabling machines to have emotional intelligence, i.e., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER). We begin with a brief introduction on widely used emotion representation models and affective modalities. We then summarize existing emotion annotation strategies and corresponding computational tasks, followed by the description of main challenges in MER. Furthermore, we present some representative approaches on representation learning of each affective modality, feature fusion of different affective modalities, classifier optimization for MER, and domain adaptation for MER. Finally, we outline several real-world applications and discuss some future directions.
Emotion recognition based on EEG has become an active research area. As one of the machine learning models, CNN has been utilized to solve diverse problems including issues in this domain. In this work, a study of CNN and its spatiotemporal feature extraction has been conducted in order to explore capabilities of the model in varied window sizes and electrode orders. Our investigation was conducted in subject-independent fashion. Results have shown that temporal information in distinct window sizes significantly affects recognition performance in both 10-fold and leave-one-subject-out cross validation. Spatial information from varying electrode order has modicum effect on classification. SVM classifier depending on spatiotemporal knowledge on the same dataset was previously employed and compared to these empirical results. Even though CNN and SVM have a homologous trend in window size effect, CNN outperformed SVM using leave-one-subject-out cross validation. This could be caused by different extracted features in the elicitation process.