Cepstral Vocal Tract Modelling for Text-To-Speech Synthesis


Abstract in English

In this paper we describe a cepstral model of the vocal tract which models both formants and antiformants. The investigated model is more precise compared to the linear prediction model, which models only the formants of the vocal tract. The exponential function is used for the inverse transformation. However, it is difficult to implement this function on a digital signal processor. To solve this issue we use a continued fraction expansion to approximate the exponential function. The transfer function that approximates the exponential function is realized by using the Infinite Impulse Response (IIR) digital filter, in which branches type Finite Impulse Response (FIR) digital filters are included. The coefficients of the FIR digital filters are just the coefficients of the real speech cepstrum. The state-space difference equations are proposed and implemented on a DSP56300 fixed-point digital signal processor (Motorola). Finally, the results of the digital signal processor implementation for chosen vowels and consonants are evaluated.

References used

VÍCH,R., SMÉKAL,Z. All-Pole and Zero- Pole Speech Modelling (Invited Paper). In Proceedings of the International Conference „BIOSIGNÁL ‘98“. June 23-25, 1998, Brno, Czech Republic, pp.196-199. ISBN 80-214- 1169-4
VÍCH,R. PŘIBIL,J. SMÉKAL,Z,: New Cepstral Zero-Pole Vocal Tract Models for TTS. In Proceedings of the International Conference EUROCON '2001, July 7-9, 2001, Bratislava, Slovakia, pp.459-462
(KHOWANSKYI, A.N.: Application of Continued Fractions and Their Generalizations in Numerical Analysis. State Publishing House for Engineering and Theoretical Literature. Moscow 1956. (In Russian

Download