Do you want to publish a course? Click here

Improvement of Speech Recognition by Merging Two Features Extraction Algorithms

تحسين أنظمة التعرف على الكلام عن طريق جمع خوارزميتين لاستخلاص السمات

2076   3   102   0 ( 0 )
 Publication date 2017
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

The speech recognition is one of the most modern technologies, which entered force in various fields of life, whether medical or security or industrial techniques. Accordingly, many related systems were developed, which differ from each otherin feature extraction methods and classification methods. In this research,three systems have been created for speech recognition.They differ from each other in the used methods during the stage of features extraction.While the first system used MFCC algorithm, the second system used LPCC algorithm, and the third system used PLP algorithm.All these three systems used HMM as classifier. At the first, the performance of the speechrecognitionprocesswas studied and evaluatedfor all the proposedsystems separately. After that, the combination algorithm was applied separately on eachpair of the studied system algorithmsin order to study the effect of using the combination algorithm onthe improvement of the speech recognition process. Twokinds of errors(simultaneous errors and dependent errors) were usedto evaluate the complementaryof each pair of the studied systems, and to study the effectiveness of the combination on improving the performance of speech recognition process. It can be seen from the results of the comparison that the best improvement ratio of speech recognition has been obtained in the case of collection MFCC and PLP algorithms with recognition ratio of 93.4%.


Artificial intelligence review:
Research summary
تعد تقنيات التعرف على الكلام من أهم التقنيات الحديثة التي دخلت بقوة في مجالات الحياة المختلفة سواء الطبية أو الأمنية أو الصناعية. في هذا البحث، تم إنشاء ثلاثة أنظمة للتعرف على الكلام تختلف في طرق استخلاص السمات: النظام الأول استخدم خوارزمية MFCC، النظام الثاني استخدم خوارزمية LPCC، والنظام الثالث استخدم خوارزمية PLP. جميع هذه الأنظمة استخدمت خوارزمية HMM كمصنف. تم تقييم أداء كل نظام على حدة، ثم تم تطبيق خوارزمية الجمع على كل زوج من الأنظمة لدراسة تأثير الجمع في تحسين التعرف على الكلام. أظهرت النتائج أن أفضل نسبة تعرف على الكلام تم الحصول عليها كانت عند جمع الخوارزميتين MFCC وPLP، حيث تم الحصول على معدل تعرف 93.4%.
Critical review
دراسة نقدية: يعتبر هذا البحث خطوة مهمة في تحسين أنظمة التعرف على الكلام من خلال دمج خوارزميات استخلاص السمات المختلفة. ومع ذلك، هناك بعض النقاط التي يمكن تحسينها. أولاً، لم يتم توضيح كيفية اختيار عينات البيانات المستخدمة في التدريب والاختبار بشكل كافٍ، مما قد يؤثر على تعميم النتائج. ثانياً، كان من الممكن استخدام مجموعة أوسع من الخوارزميات واختبارها للحصول على نتائج أكثر شمولية. وأخيراً، لم يتم مناقشة تأثير الضوضاء البيئية على أداء الأنظمة، وهو عامل مهم في التطبيقات العملية.
Questions related to the research
  1. ما هي الخوارزميات الثلاث المستخدمة لاستخلاص السمات في هذا البحث؟

    الخوارزميات الثلاث المستخدمة هي MFCC وLPCC وPLP.

  2. ما هو المصنف المستخدم في جميع الأنظمة الثلاثة؟

    المصنف المستخدم هو خوارزمية نماذج ماركوف المخفية (HMM).

  3. ما هي أفضل نسبة تعرف على الكلام تم الحصول عليها في هذا البحث؟

    أفضل نسبة تعرف على الكلام تم الحصول عليها هي 93.4% عند جمع الخوارزميتين MFCC وPLP.

  4. ما هي الأنواع المختلفة من الأخطاء التي تم اعتمادها في تقييم الأنظمة؟

    تم اعتماد نوعين من الأخطاء: الأخطاء التزامنية (simultaneous errors) والأخطاء الاعتمادية (dependent errors).


References used
Marius Zbancioc, MihaelaCostin :using neural networks and LPCC to improve speech recognition, International IEEE SCS Conference, Proceedings, Vol. 1, 2003 EX 720, pp. 445 – 448
Levy, C., Linares, G., Nocera, P., Bonastre, J.-F. : Reducing computational and memory cost for cellular phone embedded speech recognition system, Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on (Volume:5 ) , pages(309-12) vol.5 , Print ISBN:9-8484-7803-0
Dimitriadis, Maragos, P. Potamianos:Robust AM-FM Features for Speech Recognition, IEEE signal processing letters, VOL. 12, NO. 9, 2005
rate research

Read More

While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as S iri or Alexa highlights the need for more accurate NER from speech because NER is a core component for understanding what users said in dialogs. Deployed spoken dialog systems receive user input in the form of automatic speech recognition (ASR) transcripts, and simply applying NER model trained on written text to ASR transcripts often leads to low accuracy because compared to written text, ASR transcripts lack important cues such as punctuation and capitalization. Besides, errors in ASR transcripts also make NER from speech challenging. We propose two models that exploit dialog context and speech pattern clues to extract named entities more accurately from open-domain dialogs in spoken dialog systems. Our results show the benefit of modeling dialog context and speech patterns in two settings: a standard setting with random partition of data and a more realistic but also more difficult setting where many named entities encountered during deployment are unseen during training.
In general, the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems consist of four basic modules: the signal processing, the acoustic modeling, the language modeling and the search engine. While isolated word recognition systems do not contain language modeling, which is responsible for connecting words together to form understandable sentences.
Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction be tween machines and humans. This study uses the CNN+LSTM model to implement speech emotion recognition (SER) processing and prediction. From the experimental results, it is known that using the CNN+LSTM model achieves better performance than using the traditional NN model.
Hate speech detection is an actively growing field of research with a variety of recently proposed approaches that allowed to push the state-of-the-art results. One of the challenges of such automated approaches -- namely recent deep learning models -- is a risk of false positives (i.e., false accusations), which may lead to over-blocking or removal of harmless social media content in applications with little moderator intervention. We evaluate deep learning models both under in-domain and cross-domain hate speech detection conditions, and introduce an SVM approach that allows to significantly improve the state-of-the-art results when combined with the deep learning models through a simple majority-voting ensemble. The improvement is mainly due to a reduction of the false positive rate.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا