ﻻ يوجد ملخص باللغة العربية
Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.
Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation. The similarity between speech and music au
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rel
Statistical signal processing based speech enhancement methods adopt expert knowledge to design the statistical models and linear filters, which is complementary to the deep neural network (DNN) based methods which are data-driven. In this paper, by
This paper presents a speech BERT model to extract embedded prosody information in speech segments for improving the prosody of synthesized speech in neural text-to-speech (TTS). As a pre-trained model, it can learn prosody attributes from a large am
Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we pres