Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Improving the extraction of audio features In audio-visual Arabic systems

تحسين استخراج السمات السمعية في الأنظمة السمعية البصرية للمتحدثين باللغة العربية

1792 2 49 0 ( 0 )

Download Cite

Added by Aِl-Baath University ورقة بحثية

Publication date 2017

fields Computer and Automatic Control Engineering

and research's language is العربية

Authors جعفر محسن الخير( باحث ) - مريم محمد ساعي( باحث ) - نور سميع غضبان( باحث )

Created by Shamra Editor

Features extraction MFCC نماذج ماركوف المخفية التعرف على الكلام استخراج السمات خوارزمية معاملات تردد الميل المشتقات التفاضلية مكون الطاقة Speech recognition Markov Hidden models Temporal derivatives energy component

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The audio-visual speech recognition systems that rely on speech and movement of the lips of the speaker of the most important speech recognition systems. Many different techniques have developed in terms of the methods used in the feature extraction and classification methods. Research proposes the establishment of a system to identify isolated words based audio features extracted from videos pronunciations of words in Arabic in an environment free of noise, and then add the energy and Temporal derivative components in extracting features of the method Mel Frequency Cepstral Coefficient (MFCC) stage.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الدراسة تحسين استخراج السمات السمعية في أنظمة التعرف السمعية البصرية للناطقين باللغة العربية. تعتمد هذه الأنظمة على الصوت وحركة شفاه المتكلم، وقد تم تطوير العديد من التقنيات المختلفة في هذا المجال. يقترح البحث إنشاء نظام للتعرف على الكلمات المعزولة باستخدام السمات السمعية المستخرجة من فيديوهات منطوقة لكلمات باللغة العربية في بيئة خالية من الضجيج. يتم إضافة مكون الطاقة والمشتقات التفاضلية في مرحلة استخراج السمات لخوارزمية معاملات تردد الميل (MFCC). تم استخدام نماذج ماركوف المخفية (HMM) في مرحلة التصنيف. أظهرت النتائج أن إضافة السمات إلى خوارزمية MFCC زادت من أداء النظام، حيث وصلت نسبة التعرف إلى 92%. يتضمن البحث قاعدة بيانات تحتوي على 13850 مقطع فيديو ل 36 كلمة معزولة منطوقة باللغة العربية من قبل خمسين متكلم تتراوح أعمارهم بين 18 و60 عامًا. تم استخدام بيئة العمل Matlab2014a ومكتبات voicebox وsignal processing. توصل البحث إلى أن إضافة مكون الطاقة والمشتقات التفاضلية حسنت من عملية استخراج السمات السمعية وبالتالي أداء النظام ككل.

Critical review

تقدم هذه الدراسة مساهمة قيمة في تحسين أنظمة التعرف على الكلام للناطقين باللغة العربية، خاصة في بيئات خالية من الضجيج. ومع ذلك، هناك بعض النقاط التي يمكن تحسينها. أولاً، تم اختبار النظام في بيئة خالية من الضجيج، مما قد لا يعكس الأداء في الظروف الواقعية حيث يكون الضجيج موجودًا. لذلك، يمكن أن تكون الدراسة أكثر شمولية إذا تم اختبار النظام في بيئات متنوعة تحتوي على مستويات مختلفة من الضجيج. ثانيًا، يمكن أن تكون النتائج أكثر قوة إذا تم مقارنة أداء النظام مع أنظمة أخرى تستخدم تقنيات مختلفة لاستخراج السمات. أخيرًا، يمكن أن تكون الدراسة أكثر تفصيلاً في شرح كيفية تأثير المشتقات التفاضلية ومكون الطاقة على تحسين أداء النظام.

Questions related to the research

ما هي الخوارزمية المستخدمة في استخراج السمات في هذه الدراسة؟

تم استخدام خوارزمية معاملات تردد الميل (MFCC) في استخراج السمات في هذه الدراسة.
ما هي نسبة التعرف التي حققها النظام بعد إضافة مكون الطاقة والمشتقات التفاضلية؟

وصلت نسبة التعرف إلى 92% بعد إضافة مكون الطاقة والمشتقات التفاضلية.
ما هي بيئة العمل المستخدمة في هذه الدراسة؟

تم استخدام بيئة العمل Matlab2014a ومكتبات voicebox وsignal processing.
كم عدد العينات التي تم اختبار النظام عليها؟

تم اختبار النظام على 4155 عينة.

Keywords

التعرف على الكلام استخراج السمات خوارزمية معاملات تردد الميل نماذج ماركوف المخفية المشتقات التفاضلية مكون الطاقة

References used

Marius Zbancioc, Mihaela Costin :using neural networks and LPCC to improve speech recognition, International IEEE SCS Conference, Proceedings, Vol. 1, 2003 EX 720, pp. 445 – 448

Levy, C., Linares, G., Nocera, P., Bonastre, J.-F. : Reducing computational and memory cost for cellular phone embedded speech recognition system, Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on (Volume:5 ) , pages( 309-12) vol.5 , Print ISBN:9-8484-7803-0

Dimitriadis, Maragos, P. Potamianos: Robust AM-FM Features for Speech Recognition, IEEE signal processing letters, VOL. 12, NO. 9, 2005

rate research

A study of some of the problems with hearing and visual impairment in the Faculty of Arts at the University of Damascus

2667 - Aِl-Baath University 2017 ورقة بحثية

This study aims to score psychological, academic and economic problems faced by students with special needs in the Faculty of Arts at the University of Damascus and to identify the differences in these problems between the research samples. The sa mple consisted of / 72/ students from the students of the Faculty of Arts at Damascus University, was selected deliberate manner.

الإعاقة السمعية المشكلات النفسية و الأكاديمية و الاقتصادية الإعاقة البصرية طلبة المرحلة الجامعية psychological academic and economic problems visual handicapped hearing handicapped undergraduates المزيد..

The role of audio and video footprint and how Legitimacy of proof in criminal

2935 - Aِl-Baath University 2017 ورقة بحثية

we'll show in this research months fingerprints developed in the criminal investigation, a visual and sound footprint, and made it clear that these fingerprints applications of interest to specialists in the science crime and criminal evidence, it is through the visual and sound footprint can be identified by the accused and convicted of their crime or brought them, and finally we determined that This kind of fingerprinting subject to the principle of legality of evidence like any other evidence, and nothing prevents the use in evidence penal taking into account some of the controls that are indispensable to Legitimacy.

مشروعية legality بصمة صوتية بصرية دور البصمة The role of footprint visual audio imprint المزيد..

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

614 - Association for Computation Linguistics 2021 مقالة

Story visualization is an underexplored task that falls at the intersection of many important research directions in both computer vision and natural language processing. In this task, given a series of natural language captions which compose a story , an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform text-to-image synthesis models on this task. However, there is room for improvement of generated images in terms of visual quality, coherence and relevance. We present a number of improvements to prior modeling approaches, including (1) the addition of a dual learning framework that utilizes video captioning to reinforce the semantic alignment between the story and generated images, (2) a copy-transform mechanism for sequentially-consistent story visualization, and (3) MART-based transformers to model complex interactions between frames. We present ablation studies to demonstrate the effect of each of these techniques on the generative power of the model for both individual images as well as the entire narrative. Furthermore, due to the complexity and generative nature of the task, standard evaluation metrics do not accurately reflect performance. Therefore, we also provide an exploration of evaluation metrics for the model, focused on aspects of the generated frames such as the presence/quality of generated characters, the relevance to captions, and the diversity of the generated images. We also present correlation experiments of our proposed automated metrics with human evaluations.

improving generation visual stories semantic consistency تحسين الجيل القصص البصرية الاتساق الدلالي صناعة حمض الفوسفور المزيد..

Audiovisual Translation through NMT and Subtitling in the Netflix Series `Cable Girls'

881 - Association for Computation Linguistics 2021 مقالة

In recent years, the emergence of streaming platforms such as Netflix, HBO or Amazon Prime Video has reshaped the field of entertainment, which increasingly relies on subtitling, dubbing or voice-over modes. However, little is known about audiovisual translation when dealing with Neural Machine Translation (NMT) engines. This work-in-progress paper seeks to examine the English subtitles of the first episode of the popular Spanish Netflix series Cable Girls and the translated version generated by Google Translate and DeepL. Such analysis will help us determine whether there are significant linguistic differences that could lead to miscomprehension or cultural shocks. To this end, the corpus compiled consists of the Spanish script, the English subtitles available in Netflix and the translated version of the script. For the analysis of the data, errors have been classified following the DQF/MQM Error typology and have been evaluated with the automatic BLEU metric. Results show that NMT engines offer good-quality translations, which in turn may benefit translators working with audiovisual entertainment resources.

amazon prime video netflix series cable cable girls' أمازون رئيس الفيديو كابل سلسلة Netflix. الفتيات الكابلات صناعة حمض الفوسفور المزيد..

طرق الاتصال التربوي (السمعية و البصرية) (دراسة موضوعية في الحديث النبوي)

1259 - Damascus University 2009 ورقة بحثية

أُحدثت في العالم الإسلامي كليات للتربية و معاهد للتعليم؛ تبنت قيم الآخرين و تجاربهم، متجاهلة تراث الأمة التربوي. و التاريخ أصدق شاهد، و الواقع خير برهان، ففي حين عادى الغرب العلم، و أقصى العلماء، أرسى الإسلام قواعد منهج تربوي ثبتت نجاعته، و وضع أسساً للبحث العلمي، و هدفت هذه الدراسة إلى توضيح أهم معالم التربية النبوية في تفعيل الوسائل السمعية و البصرية.

الحديث النبوي طرق الاتصال التربوي السمعية البصرية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Improving the extraction of audio features In audio-visual Arabic systems

تحسين استخراج السمات السمعية في الأنظمة السمعية البصرية للمتحدثين باللغة العربية

Ask ChatGPT about the research

Read More

suggested questions