Do you want to publish a course? Click here

In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.
This paper offers a comparative evaluation of four commercial ASR systems which are evaluated according to the post-editing effort required to reach publishable'' quality and according to the number of errors they produce. For the error annotation ta sk, an original error typology for transcription errors is proposed. This study also seeks to examine whether there is a difference in the performance of these systems between native and non-native English speakers. The experimental results suggest that among the four systems, Trint obtains the best scores. It is also observed that most systems perform noticeably better with native speakers and that all systems are most prone to fluency errors.
In general, the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems consist of four basic modules: the signal processing, the acoustic modeling, the language modeling and the search engine. While isolated word recognition systems do not contain language modeling, which is responsible for connecting words together to form understandable sentences.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا