نظرا للتقدم المؤخرا لمعالجة اللغات الطبيعية، قامت عدة أعمال بتطبيق نموذج اللغة الملثم المدرب مسبقا (MLM) من Bert إلى ما بعد تصحيح التعرف على الكلام.ومع ذلك، فإن النماذج القائمة المدربة مسبقا فقط تنظر فقط في التصحيح الدلالي أثناء إهمال السمات الصوتية للكلمات.سوف يؤدي الإصلاح الدلالي الوحيد فقط إلى تقليل الأداء لأن الأخطاء هوموفونية شائعة إلى حد ما في الصيني العسكري.في هذه الورقة، اقترحنا نهجا جديدا لاستغلال التمثيل السياقي بشكل جماعي والمعلومات الصوتية بين الخطأ واستبدال المرشحين لتخفيف معدل الخطأ الصيني العسكري.أظهرت نتائج تجربتنا على مجموعات بيانات التعرف على الكلام العالمي الحقيقي أن طريقةنا المقترحة لها من الواضح أن خفضت من النموذج الأساسي، مما استخدم برت مزاملا مدربا مسبقا كصاصر.
Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.
References used
https://aclanthology.org/
Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction be
In this paper, we focus on improving the quality of the summary generated by neural abstractive dialogue summarization systems. Even though pre-trained language models generate well-constructed and promising results, it is still challenging to summar
The speech recognition is one of the most modern technologies, which entered force
in various fields of life, whether medical or security or industrial techniques. Accordingly,
many related systems were developed, which differ from each otherin fea
In general, the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems consist of four basic modules: the signal processing, the acoustic modeling, the language modeling and
While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as S