Auditory Attention Decoding from EEG using Convolutional Recurrent Neural Network

329 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhen Fu

تاريخ النشر 2021

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhen Fu - Bo Wang - Xihong Wu

معالجة الإشارات الذكاء الاصطناعي أنظمة الصوت في الحاسوب

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD, the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear models based on deep neural networks (DNN) have been proposed to solve this problem. However, these models did not fully utilize both the spatial and temporal features of EEG, and the interpretability of DNN models was rarely investigated. In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model, and compared them with both the linear model and the state-of-the-art DNN models. Results showed that, our proposed CRNN-based classification model outperformed others for shorter decoding windows (around 90% for 2 s and 5 s). Although worse than classification models, the decoding accuracy of the proposed CRNN-based regression model was about 5% greater than other regression models. The interpretability of DNN models was also investigated by visualizing layers weight.

قيم البحث

104 - Zhentao Liu , Jeffrey Mock , Yufei Huang 2019

Recent behavioral and electroencephalograph (EEG) studies have defined ways that auditory spatial attention can be allocated over large regions of space. As with most experimental studies, behavior EEG was averaged over 10s of minutes because identif ying abstract feature spatial codes from raw EEG data is extremely challenging. The goal of this study is to design a deep learning model that can learn from raw EEG data and predict auditory spatial information on a trial-by-trial basis. We designed a convolutional neural networks (CNN) model to predict the attended location or other stimulus locations relative to the attended location. A multi-task model was also used to predict the attended and stimulus locations at the same time. Based on the visualization of our models, we investigated features of individual classification tasks and joint feature of the multi-task model. Our model achieved an average 72.4% in relative location prediction and 90.0% in attended location prediction individually. The multi-task model improved the performance of attended location prediction by 3%. Our results suggest a strong correlation between attended location and relative location.

معالجة الإشارات

EEG-based Auditory Attention Decoding: Towards Neuro-Steered Hearing Devices

111 - Simon Geirnaert , Servaas Vandecappelle , Emina Alickovic 2020

People suffering from hearing impairment often have difficulties participating in conversations in so-called `cocktail party scenarios with multiple people talking simultaneously. Although advanced algorithms exist to suppress background noise in the se situations, a hearing device also needs information on which of these speakers the user actually aims to attend to. The correct (attended) speaker can then be enhanced using this information, and all other speakers can be treated as background noise. Recent neuroscientific advances have shown that it is possible to determine the focus of auditory attention from non-invasive neurorecording techniques, such as electroencephalography (EEG). Based on these new insights, a multitude of auditory attention decoding (AAD) algorithms have been proposed, which could, combined with the appropriate speaker separation algorithms and miniaturized EEG sensor devices, lead to so-called neuro-steered hearing devices. In this paper, we provide a broad review and a statistically grounded comparative study of EEG-based AAD algorithms and address the main signal processing challenges in this field.

معالجة الإشارات معالجة الصوت والكلام

Riemannian geometry-based decoding of the directional focus of auditory attention using EEG

76 - Simon Geirnaert , Tom Francart , Alexander Bertrand 2020

Auditory attention decoding (AAD) algorithms decode the auditory attention from electroencephalography (EEG) signals that capture the listeners neural activity. Such AAD methods are believed to be an important ingredient towards so-called neuro-steer ed assistive hearing devices. For example, traditional AAD decoders allow detecting to which of multiple speakers a listener is attending to by reconstructing the amplitude envelope of the attended speech signal from the EEG signals. Recently, an alternative paradigm to this stimulus reconstruction approach was proposed, in which the directional focus of auditory attention is determined instead, solely based on the EEG, using common spatial pattern filters (CSP). Here, we propose Riemannian geometry-based classification (RGC) as an alternative for this CSP approach, in which the covariance matrix of a new EEG segment is directly classified while taking its Riemannian structure into account. While the proposed RGC method performs similarly to the CSP method for short decision lengths (i.e., the amount of EEG samples used to make a decision), we show that it significantly outperforms it for longer decision window lengths.

معالجة الإشارات

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

116 - Jee-weon Jung , Hee-Soo Heo , Youngki Kwon 2021

In this work, we propose an overlapped speech detection system trained as a three-class classifier. Unlike conventional systems that perform binary classification as to whether or not a frame contains overlapped speech, the proposed approach classifi es into three classes: non-speech, single speaker speech, and overlapped speech. By training a network with the more detailed label definition, the model can learn a better notion on deciding the number of speakers included in a given frame. A convolutional recurrent neural network architecture is explored to benefit from both convolutional layers capability to model local patterns and recurrent layers ability to model sequential information. The proposed overlapped speech detection model establishes a state-of-the-art performance with a precision of 0.6648 and a recall of 0.3222 on the DIHARD II evaluation set, showing a 20% increase in recall along with higher precision. In addition, we also introduce a simple approach to utilize the proposed overlapped speech detection model for speaker diarization which ranked third place in the Track 1 of the DIHARD III challenge.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب

Decoding Movement Imagination and Execution from EEG Signals using BCI-Transfer Learning Method based on Relation Network

134 - D.-Y. Lee , J.-H. Jeong , K.-H. Shim 2020

A brain-computer interface (BCI) is used not only to control external devices for healthy people but also to rehabilitate motor functions for motor-disabled patients. Decoding movement intention is one of the most significant aspects for performing a rm movement tasks using brain signals. Decoding movement execution (ME) from electroencephalogram (EEG) signals have shown high performance in previous works, however movement imagination (MI) paradigm-based intention decoding has so far failed to achieve sufficient accuracy. In this study, we focused on a robust MI decoding method with transfer learning for the ME and MI paradigm. We acquired EEG data related to arm reaching for 3D directions. We proposed a BCI-transfer learning method based on a Relation network (BTRN) architecture. Decoding performances showed the highest performance compared to conventional works. We confirmed the possibility of the BTRN architecture to contribute to continuous decoding of MI using ME datasets.

معالجة الإشارات