ﻻ يوجد ملخص باللغة العربية
Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the conditions on the training data, we consider the task of training speech enhancement networks in a self-supervised manner. We first use a limited training set of clean speech sounds and learn a latent representation by autoencoding on their magnitude spectrograms. We then autoencode on speech mixtures recorded in noisy environments and train the resulting autoencoder to share a latent representation with the clean examples. We show that using this training schema, we can now map noisy speech to its clean version using a network that is autonomously trainable without requiring labeled training examples or human intervention.
Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two dropout regular
Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic
Recently, self-supervised pre-training has gained success in automatic speech recognition (ASR). However, considering the difference between speech accents in real scenarios, how to identify accents and use accent features to improve ASR is still cha
Multi-stage learning is an effective technique to invoke multiple deep-learning modules sequentially. This paper applies multi-stage learning to speech enhancement by using a multi-stage structure, where each stage comprises a self-attention (SA) blo
Statistical signal processing based speech enhancement methods adopt expert knowledge to design the statistical models and linear filters, which is complementary to the deep neural network (DNN) based methods which are data-driven. In this paper, by