في هذه الورقة، نقترح التعديل الطيفي عن طريق شحذ الأشكال وغير من خلال تقليل الميل الطيفي إلى التعرف على خطاب الأطفال حسب أنظمة التعرف على الكلام التلقائي (ASR) المطورة باستخدام خطاب البالغين.في هذا النوع من الحالة غير المعطاة، يتدهور أداء ASR بسبب عدم التطابق الصوتي واللغوي في السمات بين الأطفال والكبار.يتم استخدام الطريقة المقترحة لتحسين الوضوخي للكلام لتعزيز التعرف على خطاب الأطفال باستخدام نموذج صوتي مدرب على خطاب البالغين.في التجارب، يتم استخدام WSJCAM0 و PFSTAR كقواعد بيانات للبالغين وخطاب الأطفال، على التوالي.تتيح التقنية المقترحة تحسنا كبيرا في سياق ASR القائم على DNN-HMM.علاوة على ذلك، نحن نقوم بالتحقق من صحة متانة التقنية من خلال إظهار أنها تنفذ أيضا في ظروف ضوضاء غير متطابقة.
In this paper, we propose spectral modification by sharpening formants and by reducing the spectral tilt to recognize children's speech by automatic speech recognition (ASR) systems developed using adult speech. In this type of mismatched condition, the ASR performance is degraded due to the acoustic and linguistic mismatch in the attributes between children and adult speakers. The proposed method is used to improve the speech intelligibility to enhance the children's speech recognition using an acoustic model trained on adult speech. In the experiments, WSJCAM0 and PFSTAR are used as databases for adults' and children's speech, respectively. The proposed technique gives a significant improvement in the context of the DNN-HMM-based ASR. Furthermore, we validate the robustness of the technique by showing that it performs well also in mismatched noise conditions.
References used
https://aclanthology.org/
Recent work in multilingual natural language processing has shown progress in various tasks such as natural language inference and joint multilingual translation. Despite success in learning across many languages, challenges arise where multilingual
This paper is concerned with the calculation of the spectral radius of an
arbitrary real matrix A
If rank A = m = 2 then (١) and (٢) are equalities.
In addition, we provide the numerical radius r(A) of an n×n matrix whose
diagonal entries are complex numbers.
Speech denoising is a field of engineering that studies techniques used to recover the
original signal from the noisy signal corrupted with different types of noise, such as
broadband noise and narrowband noise, and other types present in environme
Implicit discourse relation recognition (IDRR) aims to identify logical relations between two adjacent sentences in the discourse. Existing models fail to fully utilize the contextual information which plays an important role in interpreting each loc
Deriving and modifying graphs from natural language text has become a versatile basis technology for information extraction with applications in many subfields, such as semantic parsing or knowledge graph construction. A recent work used this techniq