ﻻ يوجد ملخص باللغة العربية
In this paper, we construct a new Japanese speech corpus for speech-based summarization and simplification, JSSS (pronounced j-triple-s). Given the success of reading-style speech synthesis from short-form sentences, we aim to design more difficult tasks for delivering information to humans. Our corpus contains voices recorded for two tasks that have a role in providing information under constraints: duration-constrained text-to-speech summarization and speaking-style simplification. It also contains utterances of long-form sentences as an optional task. This paper describes how we designed the corpus, which is available on our project page.
Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we are developing Japanese voice corpora reasonably accessible from not onl
Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic
We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation. The network is composed of an encoder, spectrog
A large and growing amount of speech content in real-life scenarios is being recorded on common consumer devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-qualit
We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures. The Uzbek speech