New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Improvement of Speech Recognition by Merging Two Features Extraction Algorithms

تحسين أنظمة التعرف على الكلام عن طريق جمع خوارزميتين لاستخلاص السمات

2121 3 102 0 ( 0 )

Download Cite

Added by Tishreen University ورقة بحثية

Publication date 2017

fields Computer and Automatic Control Engineering

and research's language is العربية

Authors جعفر الخير( باحث )

Created by Shamra Editor

Features extraction نماذج ماركوف المخفية التعرف على الكلام استخراج السمات Speech recognition Markov Hidden models

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

تعد تقنيات التعرف على الكلام من أهم التقنيات الحديثة التي دخلت بقوة في مجالات الحياة المختلفة سواء الطبية أو الأمنية أو الصناعية. و بناءً عليه تم تطوير العديد من الأنظمة المعتمدة على طرق مختلفة في استخلاص السمات و التصنيف. في هذا البحث تم إنشاء ثلاثة أنظمة للتعرف على الكلام، تختلف عن بعضها البعض بالطرق المستخدمة في مرحلة استخلاص السمات، حيث استخدم النظام الأول خوارزمية MFCC بينما استخدم النظام الثاني خوارزمية LPCC أما النظام الثالث فاستخدم خوارزمية PLP. تشترك هذه الأنظمة بطريقة التصنيف حيث استخدمت خوارزمية الـHMM كمصنف. في البداية تم دراسة و تقييم أداء عملية التعرف على الكلام للأنظمة الثلاثة السابقة المقترحة منفردةً. بعد ذلك تم تطبيق خوارزمية الجمع على كل زوج من الأنظمة المدروسة و ذلك لدراسة أثر خوارزمية الجمع في تحسين التعرف على الكلام. تم اعتماد نوعين من الأخطاء، الأخطاء التزامنية (simultaneous errors) و الأخطاء الاعتمادية ((dependent errors، كوحدة مقارنة لدراسة فعالية خوارزمية الجمع في تحسين أداء عملية التعرف على الكلام. يتبين من نتائج المقارنة أن أفضل نسبة تعرف على الكلام تم الحصول عليها في حالة جمع الخوارزميتان MFCC و PLP حيث تم الحصول على معدل تعرف 93.4%.

The speech recognition is one of the most modern technologies, which entered force in various fields of life, whether medical or security or industrial techniques. Accordingly, many related systems were developed, which differ from each otherin feature extraction methods and classification methods. In this research,three systems have been created for speech recognition.They differ from each other in the used methods during the stage of features extraction.While the first system used MFCC algorithm, the second system used LPCC algorithm, and the third system used PLP algorithm.All these three systems used HMM as classifier. At the first, the performance of the speechrecognitionprocesswas studied and evaluatedfor all the proposedsystems separately. After that, the combination algorithm was applied separately on eachpair of the studied system algorithmsin order to study the effect of using the combination algorithm onthe improvement of the speech recognition process. Twokinds of errors(simultaneous errors and dependent errors) were usedto evaluate the complementaryof each pair of the studied systems, and to study the effectiveness of the combination on improving the performance of speech recognition process. It can be seen from the results of the comparison that the best improvement ratio of speech recognition has been obtained in the case of collection MFCC and PLP algorithms with recognition ratio of 93.4%.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تعد تقنيات التعرف على الكلام من أهم التقنيات الحديثة التي دخلت بقوة في مجالات الحياة المختلفة سواء الطبية أو الأمنية أو الصناعية. في هذا البحث، تم إنشاء ثلاثة أنظمة للتعرف على الكلام تختلف في طرق استخلاص السمات: النظام الأول استخدم خوارزمية MFCC، النظام الثاني استخدم خوارزمية LPCC، والنظام الثالث استخدم خوارزمية PLP. جميع هذه الأنظمة استخدمت خوارزمية HMM كمصنف. تم تقييم أداء كل نظام على حدة، ثم تم تطبيق خوارزمية الجمع على كل زوج من الأنظمة لدراسة تأثير الجمع في تحسين التعرف على الكلام. أظهرت النتائج أن أفضل نسبة تعرف على الكلام تم الحصول عليها كانت عند جمع الخوارزميتين MFCC وPLP، حيث تم الحصول على معدل تعرف 93.4%.

Critical review

دراسة نقدية: يعتبر هذا البحث خطوة مهمة في تحسين أنظمة التعرف على الكلام من خلال دمج خوارزميات استخلاص السمات المختلفة. ومع ذلك، هناك بعض النقاط التي يمكن تحسينها. أولاً، لم يتم توضيح كيفية اختيار عينات البيانات المستخدمة في التدريب والاختبار بشكل كافٍ، مما قد يؤثر على تعميم النتائج. ثانياً، كان من الممكن استخدام مجموعة أوسع من الخوارزميات واختبارها للحصول على نتائج أكثر شمولية. وأخيراً، لم يتم مناقشة تأثير الضوضاء البيئية على أداء الأنظمة، وهو عامل مهم في التطبيقات العملية.

Questions related to the research

ما هي الخوارزميات الثلاث المستخدمة لاستخلاص السمات في هذا البحث؟

الخوارزميات الثلاث المستخدمة هي MFCC وLPCC وPLP.
ما هو المصنف المستخدم في جميع الأنظمة الثلاثة؟

المصنف المستخدم هو خوارزمية نماذج ماركوف المخفية (HMM).
ما هي أفضل نسبة تعرف على الكلام تم الحصول عليها في هذا البحث؟

أفضل نسبة تعرف على الكلام تم الحصول عليها هي 93.4% عند جمع الخوارزميتين MFCC وPLP.
ما هي الأنواع المختلفة من الأخطاء التي تم اعتمادها في تقييم الأنظمة؟

تم اعتماد نوعين من الأخطاء: الأخطاء التزامنية (simultaneous errors) والأخطاء الاعتمادية (dependent errors).

Keywords

التعرف على الكلام استخلاص السمات نماذج ماركوف المخفية خوارزمية الجمع MFCC PLP LPCC

References used

Marius Zbancioc, MihaelaCostin :using neural networks and LPCC to improve speech recognition, International IEEE SCS Conference, Proceedings, Vol. 1, 2003 EX 720, pp. 445 – 448

Levy, C., Linares, G., Nocera, P., Bonastre, J.-F. : Reducing computational and memory cost for cellular phone embedded speech recognition system, Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on (Volume:5 ) , pages(309-12) vol.5 , Print ISBN:9-8484-7803-0

Dimitriadis, Maragos, P. Potamianos:Robust AM-FM Features for Speech Recognition, IEEE signal processing letters, VOL. 12, NO. 9, 2005

rate research

Improving Named Entity Recognition in Spoken Dialog Systems by Context and Speech Pattern Modeling

356 - Association for Computation Linguistics 2021 مقالة

While named entity recognition (NER) from speech has been around as long as NER from written text has, the accuracy of NER from speech has generally been much lower than that of NER from text. The rise in popularity of spoken dialog systems such as S iri or Alexa highlights the need for more accurate NER from speech because NER is a core component for understanding what users said in dialogs. Deployed spoken dialog systems receive user input in the form of automatic speech recognition (ASR) transcripts, and simply applying NER model trained on written text to ASR transcripts often leads to low accuracy because compared to written text, ASR transcripts lack important cues such as punctuation and capitalization. Besides, errors in ASR transcripts also make NER from speech challenging. We propose two models that exploit dialog context and speech pattern clues to extract named entities more accurately from open-domain dialogs in spoken dialog systems. Our results show the benefit of modeling dialog context and speech patterns in two settings: a standard setting with random partition of data and a more realistic but also more difficult setting where many named entities encountered during deployment are unseen during training.

استراتيجيات الاستشارة spoken dialog systems improving named entity نظم الحوار المنطوق صناعة حمض الفوسفور

Automatic Speech Recognition Algorithms

2269 - Higher Institute for Applied Sciences and Technology 2017 رسالة ماجستير

In general, the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems consist of four basic modules: the signal processing, the acoustic modeling, the language modeling and the search engine. While isolated word recognition systems do not contain language modeling, which is responsible for connecting words together to form understandable sentences.

Nlp التعرف على الكلام لغويات asr Automatic Speech Recognition

Evaluation of Features Selection in Enhancing the Performance of Palm Print Recognition

1293 - Aِl-Baath University 2016 ورقة بحثية

This paper introduces a new approach to extract palm print features and select the best ones. The paper also studies the effectiveness of the selection process on speed and performance of system.

Image processing معالجة الصورة اختيار السمات Feature Selection تجزيء صورة اليد خطوط الحياة أداء نظم التعرف Palm Print Segmentation Lifelines Recognition Performance المزيد..

Speech Emotion Recognition Based on CNN+LSTM Model

735 - Association for Computation Linguistics 2021 مقالة

Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction be tween machines and humans. This study uses the CNN+LSTM model to implement speech emotion recognition (SER) processing and prediction. From the experimental results, it is known that using the CNN+LSTM model achieves better performance than using the traditional NN model.

emotion recognition based speech emotion recognition emotion recognition العاطفة الاعتراف مقرها التعرف على العاطفة الكلام العاطفة الاعتراف صناعة حمض الفوسفور المزيد..

Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate

380 - Association for Computation Linguistics 2021 مقالة

Hate speech detection is an actively growing field of research with a variety of recently proposed approaches that allowed to push the state-of-the-art results. One of the challenges of such automated approaches -- namely recent deep learning models -- is a risk of false positives (i.e., false accusations), which may lead to over-blocking or removal of harmless social media content in applications with little moderator intervention. We evaluate deep learning models both under in-domain and cross-domain hate speech detection conditions, and introduce an SVM approach that allows to significantly improve the state-of-the-art results when combined with the deep learning models through a simple majority-voting ensemble. The improvement is mainly due to a reduction of the false positive rate.

ميزات الاستعارة البغيضة cross-domain hate speech false positive rate خطاب الكراهية عبر المجال معدل إيجابي كاذب صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Improvement of Speech Recognition by Merging Two Features Extraction Algorithms

تحسين أنظمة التعرف على الكلام عن طريق جمع خوارزميتين لاستخلاص السمات

Ask ChatGPT about the research

Read More

suggested questions