ﻻ يوجد ملخص باللغة العربية
Silent speech interfaces (SSI) has been an exciting area of recent interest. In this paper, we present a non-invasive silent speech interface that uses inaudible acoustic signals to capture peoples lip movements when they speak. We exploit the speaker and microphone of the smartphone to emit signals and listen to their reflections, respectively. The extracted phase features of these reflections are fed into the deep learning networks to recognize speech. And we also propose an end-to-end recognition framework, which combines the CNN and attention-based encoder-decoder network. Evaluation results on a limited vocabulary (54 sentences) yield word error rates of 8.4% in speaker-independent and environment-independent settings, and 8.1% for unseen sentence testing.
End-to-end (E2E) systems have played a more and more important role in automatic speech recognition (ASR) and achieved great performance. However, E2E systems recognize output word sequences directly with the input acoustic feature, which can only be
Despite successful applications of end-to-end approaches in multi-channel speech recognition, the performance still degrades severely when the speech is corrupted by reverberation. In this paper, we integrate the dereverberation module into the end-t
End-to-end acoustic speech recognition has quickly gained widespread popularity and shows promising results in many studies. Specifically the joint transformer/CTC model provides very good performance in many tasks. However, under noisy and distorted
Recently neural architecture search(NAS) has been successfully used in image classification, natural language processing, and automatic speech recognition(ASR) tasks for finding the state-of-the-art(SOTA) architectures than those human-designed archi
Confidence measure is a performance index of particular importance for automatic speech recognition (ASR) systems deployed in real-world scenarios. In the present study, utterance-level neural confidence measure (NCM) in end-to-end automatic speech r