ﻻ يوجد ملخص باللغة العربية
In this paper, we discuss the issues in automatic recognition of vowels in Persian language. The present work focuses on new statistical method of recognition of vowels as a basic unit of syllables. First we describe a vowel detection system then briefly discuss how the detected vowels can feed to recognition unit. According to pattern recognition, Support Vector Machines (SVM) as a discriminative classifier and Gaussian mixture model (GMM) as a generative model classifier are two most popular techniques. Current state-ofthe- art systems try to combine them together for achieving more power of classification and improving the performance of the recognition systems. The main idea of the study is to combine probabilistic SVM and traditional GMM pattern classification with some characteristic of speech like band-pass energy to achieve better classification rate. This idea has been analytically formulated and tested on a FarsDat based vowel recognition system. The results show inconceivable increases in recognition accuracy. The tests have been carried out by various proposed vowel recognition algorithms and the results have been compared.
The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of durat
In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speakers utterances from a monaural mixture of multiple speakers speech given a
Imprecise vowel articulation can be observed in people with Parkinsons disease (PD). Acoustic features measuring vowel articulation have been demonstrated to be effective indicators of PD in its assessment. Standard clinical vowel articulation featur
In recent years, speech emotion recognition technology is of great significance in industrial applications such as call centers, social robots and health care. The combination of speech recognition and speech emotion recognition can improve the feedb
Cued Speech (CS) is a visual communication system for the deaf or hearing impaired people. It combines lip movements with hand cues to obtain a complete phonetic repertoire. Current deep learning based methods on automatic CS recognition suffer from