ﻻ يوجد ملخص باللغة العربية
A stream attention framework has been applied to the posterior probabilities of the deep neural network (DNN) to improve the far-field automatic speech recognition (ASR) performance in the multi-microphone configuration. The stream attention scheme has been realized through an attention vector, which is derived by predicting the ASR performance from the phoneme posterior distribution of individual microphone stream, focusing the recognizers attention to more reliable microphones. Investigation on the various ASR performance measures has been carried out using the real recorded dataset. Experiments results show that the proposed framework has yielded substantial improvements in word error rate (WER).
Performance degradation of an Automatic Speech Recognition (ASR) system is commonly observed when the test acoustic condition is different from training. Hence, it is essential to make ASR systems robust against various environmental distortions, suc
We propose a straightforward and cost-effective method to perform diffuse soundfield measurements for calibrating the magnitude response of a microphone array. Typically, such calibration is performed in a diffuse soundfield created in reverberation
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of tradition
In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is
This paper proposes a serialized multi-layer multi-head attention for neural speaker embedding in text-independent speaker verification. In prior works, frame-level features from one layer are aggregated to form an utterance-level representation. Ins