ﻻ يوجد ملخص باللغة العربية
This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2). For both tasks, we followed the cascaded ASR+TTS structure, using phonetic tokens as the TTS input instead of the text or characters. For Task 1, we used the international phonetic alphabet (IPA) as the input of the TTS model. For Task 2, we used unsupervised phonetic symbols extracted by the vector-quantized variational autoencoder (VQVAE). In the evaluation, the listening test showed that our systems performed well in the VCC2020 challenge.
The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed
Traditional voice conversion(VC) has been focused on speaker identity conversion for speech with a neutral expression. We note that emotional expression plays an essential role in daily communication, and the emotional style of speech can be speaker-
The Voice Conversion Challenge 2020 is the third edition under its flagship that promotes intra-lingual semiparallel and cross-lingual voice conversion (VC). While the primary evaluation of the challenge submissions was done through crowd-sourced lis
Singing voice conversion is converting the timbre in the source singing to the target speakers voice while keeping singing content the same. However, singing data for target speaker is much more difficult to collect compared with normal speech data.I
In this paper, we present an open-source software for developing a nonparallel voice conversion (VC) system named crank. Although we have released an open-source VC software based on the Gaussian mixture model named sprocket in the last VC Challenge,