Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Speech Dereverberation with Context-aware Recurrent Neural Networks

112 0 0.0 ( 0 )

Download Cite

Added by Jo\\~ao Felipe Santos

Publication date 2017

fields Informatics Engineering Electronic Engineering

and research's language is English

Authors Joao Felipe Santos - Tiago H. Falk

Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart. Our models are capable of extracting features that take into account both short and long-term dependencies in the signal through a convolutional encoder (which extracts features from a short, bounded context of frames) and a recurrent neural network for extracting long-term information. Our model outperforms a recently proposed model that uses different context information depending on the reverberation time, without requiring any sort of additional input, yielding improvements of up to 0.4 on PESQ, 0.3 on STOI, and 1.0 on POLQA relative to reverberant speech. We also show our model is able to generalize to real room impulse responses even when only trained with simulated room impulse responses, different speakers, and high reverberation times. Lastly, listening tests show the proposed method outperforming benchmark models in reduction of perceived reverberation.

rate research

Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm

121 - Wei-Jen Lee , Syu-Siang Wang , Fei Chen 2018

Reverberation, which is generally caused by sound reflections from walls, ceilings, and floors, can result in severe performance degradation of acoustic applications. Due to a complicated combination of attenuation and time-delay effects, the reverberation property is difficult to characterize, and it remains a challenging task to effectively retrieve the anechoic speech signals from reverberation ones. In the present study, we proposed a novel integrated deep and ensemble learning algorithm (IDEA) for speech dereverberation. The IDEA consists of offline and online phases. In the offline phase, we train multiple dereverberation models, each aiming to precisely dereverb speech signals in a particular acoustic environment; then a unified fusion function is estimated that aims to integrate the information of multiple dereverberation models. In the online phase, an input utterance is first processed by each of the dereverberation models. The outputs of all models are integrated accordingly to generate the final anechoic signal. We evaluated the IDEA on designed acoustic environments, including both matched and mismatched conditions of the training and testing data. Experimental results confirm that the proposed IDEA outperforms single deep-neural-network-based dereverberation model with the same model architecture and training data.

Sound Audio and Speech Processing

LaFurca: Iterative Refined Speech Separation Based on Context-Aware Dual-Path Parallel Bi-LSTM

68 - Ziqiang Shi , Rujie Liu , Jiqing Han 2020

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet cite{luo2019dual}. In this paper, we propose several improvements of dual-path BiLSTM based network for end-to-end approach to monaural speech separation. Firstly a dual-path network with intra-parallel BiLSTM and inter-parallel BiLSTM components is introduced to reduce performance sub-variances among different branches. Secondly, we propose to use global context aware inter-intra cross-parallel BiLSTM to further perceive the global contextual information. Finally, a spiral multi-stage dual-path BiLSTM is proposed to iteratively refine the separation results of the previous stages. All these networks take the mixed utterance of two speakers and map it to two separate utterances, where each utterance contains only one speakers voice. For the objective, we propose to train the network by directly optimizing the utterance level scale-invariant signal-to-distortion ratio (SI-SDR) in a permutation invariant training (PIT) style. Our experiments on the public WSJ0-2mix data corpus results in 20.55dB SDR improvement, 20.35dB SI-SDR improvement, 3.69 of PESQ, and 94.86% of ESTOI, which shows our proposed networks can lead to performance improvement on the speaker separation task. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our LaFurca is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.

Sound Audio and Speech Processing

Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays

90 - Jonah Casebeer , Jamshed Kaikaus , Paris Smaragdis 2020

In this paper, we present a method for jointly-learning a microphone selection mechanism and a speech enhancement network for multi-channel speech enhancement with an ad-hoc microphone array. The attention-based microphone selection mechanism is trained to reduce communication-costs through a penalty term which represents a task-performance/ communication-cost trade-off. While working within the trade-off, our method can intelligently stream from more microphones in lower SNR scenes and fewer microphones in higher SNR scenes. We evaluate the model in complex echoic acoustic scenes with moving sources and show that it matches the performance of models that stream from a fixed number of microphones while reducing communication costs.

Sound Audio and Speech Processing

Multi-Channel Speech Enhancement using Graph Neural Networks

88 - Panagiotis Tzirakis , Anurag Kumar , Jacob Donley 2021

Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us to apply graph neural networks (GNN) to find spatial correlations among the different channels (nodes). We utilize graph convolution networks (GCN) by incorporating them in the embedding space of a U-Net architecture. We use LibriSpeech dataset and simulate room acoustics data to extensively experiment with our approach using different array types, and number of microphones. Results indicate the superiority of our approach when compared to prior state-of-the-art method.

Sound Audio and Speech Processing

DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation

331 - Yihui Fu , Jian Wu , Yanxin Hu 2020

In this paper, we propose a multi-channel network for simultaneous speech dereverberation, enhancement and separation (DESNet). To enable gradient propagation and joint optimization, we adopt the attentional selection mechanism of the multi-channel features, which is originally proposed in end-to-end unmixing, fixed-beamforming and extraction (E2E-UFE) structure. Furthermore, the novel deep complex convolutional recurrent network (DCCRN) is used as the structure of the speech unmixing and the neural network based weighted prediction error (WPE) is cascaded beforehand for speech dereverberation. We also introduce the staged SNR strategy and symphonic loss for the training of the network to further improve the final performance. Experiments show that in non-dereverberated case, the proposed DESNet outperforms DCCRN and most state-of-the-art structures in speech enhancement and separation, while in dereverberated scenario, DESNet also shows improvements over the cascaded WPE-DCCRN networks.

Sound Audio and Speech Processing

comments

Fetching comments

Sham Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Speech Dereverberation with Context-aware Recurrent Neural Networks

Ask ChatGPT about the research

No Arabic abstract

Read More