Do you want to publish a course? Click here

Automatic Speech Recognition Algorithms

خوارزميات تعرّف على الكلام آلياً

2174   2   11   5.0 ( 1 )
 Publication date 2017
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

In general, the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems consist of four basic modules: the signal processing, the acoustic modeling, the language modeling and the search engine. While isolated word recognition systems do not contain language modeling, which is responsible for connecting words together to form understandable sentences.


Artificial intelligence review:
Research summary
تتناول الأطروحة دراسة أنظمة تعرف الكلام آلياً، وتهدف إلى تحويل الكلام المنطوق إلى نص مكتوب. تتكون أنظمة تعرف الكلام المستمر آلياً من أربع مكونات أساسية: معالجة الإشارة، النمذجة الصوتية، النمذجة اللغوية، ومحرك البحث. بينما لا تحتوي أنظمة تعرف الكلمات المنفصلة على النمذجة اللغوية. في جزء معالجة الإشارة، تم دراسة خوارزميتين لاستخراج السمات: معاملات الكيبيسترال بتردد ميل (MFCC) ومعاملات الكيبيسترال لمويجات جاماتون (GWCC)، وتم اختبار أدائهما باستخدام قاعدة بيانات TIDIGITS. تم استخدام نموذج ماركوف المخفي (HMM) لبناء المصنف، نظراً لمرونته وسهولة تعديله. تم اقتراح خوارزمية جديدة: معاملات الكيبيسترال بمعامل Q ثابت (CQCC) ومقارنة أدائها مع الخوارزميتين السابقتين. كما تم اختبار أداء الخوارزميات في بيئات ضجيج مختلفة (قطار، محطة، مطعم، ...).
Critical review
تعتبر هذه الدراسة شاملة ومفصلة في مجال تعرف الكلام آلياً، حيث تناولت دراسة خوارزميات متعددة واختبرت أدائها في بيئات مختلفة. ومع ذلك، يمكن توجيه بعض النقد البناء لهذه الدراسة. أولاً، قد يكون من الأفضل تضمين المزيد من قواعد البيانات المختلفة لاختبار الخوارزميات، مما يعزز من موثوقية النتائج. ثانياً، يمكن تحسين الدراسة من خلال تقديم تحليل أعمق لأسباب تفوق بعض الخوارزميات على الأخرى في بيئات ضجيج معينة. وأخيراً، يمكن أن تكون الدراسة أكثر شمولاً إذا تم تضمين تطبيقات عملية لأنظمة تعرف الكلام في الحياة اليومية، مثل استخدامها في الأجهزة الذكية أو السيارات.
Questions related to the research
  1. ما هي المكونات الأساسية لأنظمة تعرف الكلام المستمر آلياً؟

    تتكون أنظمة تعرف الكلام المستمر آلياً من أربع مكونات أساسية: معالجة الإشارة، النمذجة الصوتية، النمذجة اللغوية، ومحرك البحث.

  2. ما هي الخوارزميات التي تم دراستها لاستخراج السمات في هذه الأطروحة؟

    تم دراسة خوارزميتين لاستخراج السمات: معاملات الكيبيسترال بتردد ميل (MFCC) ومعاملات الكيبيسترال لمويجات جاماتون (GWCC).

  3. ما هي الخوارزمية الجديدة التي تم اقتراحها في هذه الدراسة؟

    تم اقتراح خوارزمية جديدة هي معاملات الكيبيسترال بمعامل Q ثابت (CQCC).

  4. كيف تم اختبار أداء الخوارزميات في بيئات ضجيج مختلفة؟

    تم اختبار أداء الخوارزميات بإضافة أنواع مختلفة من الضجيج (قطار، محطة، مطعم، ... ) إلى الاختبارات.


References used
V. Kumar.S. Singh, S. Ahuja, and R. Chadha N. Trivedi, "Speech Recognition by Wavelet Analysis," International Journal of Computer Applications, vol. 15, no. 8, February 2011.
rate research

Read More

The main purpose of the present research is to support Arabic Text- to - Speech synthesizers, with natural prosody, based on linguistic analysis of texts to synthesize, and automatic prosody generation, using rules which are deduced from recorded s ignals analysis, of different types of sentences in Arabic. All the types of Arabic sentences (declarative and constructive) were enumerated with the help of an expert in Arabic linguistics . A textual corpus of about 2500 sentences covering most of these types was built and recorded both in natural prosody and without prosody. Later, these sentences were analyzed to extract prosody effect on the signal parameters, and to build prosody generation rules. In this paper, we present the results on negation sentences, applied on synthesized speech using the open source tool MBROLA. The results can be used with any parametric Arabic synthesizer. Future work will apply the rules on a new Arabic synthesizer based on semi-syllables units, which is under development in the Higher Institute for Applied Sciences and Technology.
The speech recognition is one of the most modern technologies, which entered force in various fields of life, whether medical or security or industrial techniques. Accordingly, many related systems were developed, which differ from each otherin fea ture extraction methods and classification methods. In this research,three systems have been created for speech recognition.They differ from each other in the used methods during the stage of features extraction.While the first system used MFCC algorithm, the second system used LPCC algorithm, and the third system used PLP algorithm.All these three systems used HMM as classifier. At the first, the performance of the speechrecognitionprocesswas studied and evaluatedfor all the proposedsystems separately. After that, the combination algorithm was applied separately on eachpair of the studied system algorithmsin order to study the effect of using the combination algorithm onthe improvement of the speech recognition process. Twokinds of errors(simultaneous errors and dependent errors) were usedto evaluate the complementaryof each pair of the studied systems, and to study the effectiveness of the combination on improving the performance of speech recognition process. It can be seen from the results of the comparison that the best improvement ratio of speech recognition has been obtained in the case of collection MFCC and PLP algorithms with recognition ratio of 93.4%.
Medical simulators provide a controlled environment for training and assessing clinical skills. However, as an assessment platform, it requires the presence of an experienced examiner to provide performance feedback, commonly preformed using a task s pecific checklist. This makes the assessment process inefficient and expensive. Furthermore, this evaluation method does not provide medical practitioners the opportunity for independent training. Ideally, the process of filling the checklist should be done by a fully-aware objective system, capable of recognizing and monitoring the clinical performance. To this end, we have developed an autonomous and a fully automatic speech-based checklist system, capable of objectively identifying and validating anesthesia residents' actions in a simulation environment. Based on the analyzed results, our system is capable of recognizing most of the tasks in the checklist: F1 score of 0.77 for all of the tasks, and F1 score of 0.79 for the verbal tasks. Developing an audio-based system will improve the experience of a wide range of simulation platforms. Furthermore, in the future, this approach may be implemented in the operation room and emergency room. This could facilitate the development of automatic assistive technologies for these domains.
Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction be tween machines and humans. This study uses the CNN+LSTM model to implement speech emotion recognition (SER) processing and prediction. From the experimental results, it is known that using the CNN+LSTM model achieves better performance than using the traditional NN model.
While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Random ized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا