ﻻ يوجد ملخص باللغة العربية
In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals. We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs. Experimental results on subsets of UrbanSound8k and Mozilla common voice datasets have shown considerable improvements on the quality of the generated samples measured by the Frechet inception distance. Moreover, reconstructed signals from these samples, have achieved higher signal to noise ratio compared to regular LS-GANs.
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In thi
One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach bas
Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To addre
Human speech processing is inherently multimodal, where visual cues (lip movements) help to better understand the speech in noise. Lip-reading driven speech enhancement significantly outperforms benchmark audio-only approaches at low signal-to-noise
We demonstrate the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios. We propose two methods for finding such perturbations. The first me