No Arabic abstract
The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD, the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear models based on deep neural networks (DNN) have been proposed to solve this problem. However, these models did not fully utilize both the spatial and temporal features of EEG, and the interpretability of DNN models was rarely investigated. In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model, and compared them with both the linear model and the state-of-the-art DNN models. Results showed that, our proposed CRNN-based classification model outperformed others for shorter decoding windows (around 90% for 2 s and 5 s). Although worse than classification models, the decoding accuracy of the proposed CRNN-based regression model was about 5% greater than other regression models. The interpretability of DNN models was also investigated by visualizing layers weight.
Recent behavioral and electroencephalograph (EEG) studies have defined ways that auditory spatial attention can be allocated over large regions of space. As with most experimental studies, behavior EEG was averaged over 10s of minutes because identifying abstract feature spatial codes from raw EEG data is extremely challenging. The goal of this study is to design a deep learning model that can learn from raw EEG data and predict auditory spatial information on a trial-by-trial basis. We designed a convolutional neural networks (CNN) model to predict the attended location or other stimulus locations relative to the attended location. A multi-task model was also used to predict the attended and stimulus locations at the same time. Based on the visualization of our models, we investigated features of individual classification tasks and joint feature of the multi-task model. Our model achieved an average 72.4% in relative location prediction and 90.0% in attended location prediction individually. The multi-task model improved the performance of attended location prediction by 3%. Our results suggest a strong correlation between attended location and relative location.
People suffering from hearing impairment often have difficulties participating in conversations in so-called `cocktail party scenarios with multiple people talking simultaneously. Although advanced algorithms exist to suppress background noise in these situations, a hearing device also needs information on which of these speakers the user actually aims to attend to. The correct (attended) speaker can then be enhanced using this information, and all other speakers can be treated as background noise. Recent neuroscientific advances have shown that it is possible to determine the focus of auditory attention from non-invasive neurorecording techniques, such as electroencephalography (EEG). Based on these new insights, a multitude of auditory attention decoding (AAD) algorithms have been proposed, which could, combined with the appropriate speaker separation algorithms and miniaturized EEG sensor devices, lead to so-called neuro-steered hearing devices. In this paper, we provide a broad review and a statistically grounded comparative study of EEG-based AAD algorithms and address the main signal processing challenges in this field.
Auditory attention decoding (AAD) algorithms decode the auditory attention from electroencephalography (EEG) signals that capture the listeners neural activity. Such AAD methods are believed to be an important ingredient towards so-called neuro-steered assistive hearing devices. For example, traditional AAD decoders allow detecting to which of multiple speakers a listener is attending to by reconstructing the amplitude envelope of the attended speech signal from the EEG signals. Recently, an alternative paradigm to this stimulus reconstruction approach was proposed, in which the directional focus of auditory attention is determined instead, solely based on the EEG, using common spatial pattern filters (CSP). Here, we propose Riemannian geometry-based classification (RGC) as an alternative for this CSP approach, in which the covariance matrix of a new EEG segment is directly classified while taking its Riemannian structure into account. While the proposed RGC method performs similarly to the CSP method for short decision lengths (i.e., the amount of EEG samples used to make a decision), we show that it significantly outperforms it for longer decision window lengths.
In this work, we propose an overlapped speech detection system trained as a three-class classifier. Unlike conventional systems that perform binary classification as to whether or not a frame contains overlapped speech, the proposed approach classifies into three classes: non-speech, single speaker speech, and overlapped speech. By training a network with the more detailed label definition, the model can learn a better notion on deciding the number of speakers included in a given frame. A convolutional recurrent neural network architecture is explored to benefit from both convolutional layers capability to model local patterns and recurrent layers ability to model sequential information. The proposed overlapped speech detection model establishes a state-of-the-art performance with a precision of 0.6648 and a recall of 0.3222 on the DIHARD II evaluation set, showing a 20% increase in recall along with higher precision. In addition, we also introduce a simple approach to utilize the proposed overlapped speech detection model for speaker diarization which ranked third place in the Track 1 of the DIHARD III challenge.
A brain-computer interface (BCI) is used not only to control external devices for healthy people but also to rehabilitate motor functions for motor-disabled patients. Decoding movement intention is one of the most significant aspects for performing arm movement tasks using brain signals. Decoding movement execution (ME) from electroencephalogram (EEG) signals have shown high performance in previous works, however movement imagination (MI) paradigm-based intention decoding has so far failed to achieve sufficient accuracy. In this study, we focused on a robust MI decoding method with transfer learning for the ME and MI paradigm. We acquired EEG data related to arm reaching for 3D directions. We proposed a BCI-transfer learning method based on a Relation network (BTRN) architecture. Decoding performances showed the highest performance compared to conventional works. We confirmed the possibility of the BTRN architecture to contribute to continuous decoding of MI using ME datasets.