بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs

253 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ernest Fokoue

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Zichen Ma - Ernest Fokoue

أنظمة الصوت في الحاسوب تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation

قيم البحث

330 - Xuechen Liu , Md Sahidullah , Tomi Kinnunen 2021

We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allow ing the model to be adapted to data flexibly. In practice, we formulate data-driv

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Supervised attention for speaker recognition

111 - Seong Min Kye , Joon Son Chung , Hoirin Kim 2020

The recently proposed self-attentive pooling (SAP) has shown good performance in several speaker recognition systems. In SAP systems, the context vector is trained end-to-end together with the feature extractor, where the role of context vector is to select the most discriminative frames for speaker recognition. However, the SAP underperforms compared to the temporal average pooling (TAP) baseline in some settings, which implies that the attention is not learnt effectively in end-to-end training. To tackle this problem, we introduce strategies for training the attention mechanism in a supervised manner, which learns the context vector using classified samples. With our proposed methods, context vector can be boosted to select the most informative frames. We show that our method outperforms existing methods in various experimental settings including short utterance speaker recognition, and achieves competitive performance over the existing baselines on the VoxCeleb datasets.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

A Principle Solution for Enroll-Test Mismatch in Speaker Recognition

71 - Lantian Li , Dong Wang , Jiawen Kang 2020

Mismatch between enrollment and test conditions causes serious performance degradation on speaker recognition systems. This paper presents a statistics decomposition (SD) approach to solve this problem. This approach is based on the normalized likeli hood (NL) scoring framework, and is theoretically optimal if the statistics on both the enrollment and test conditions are accurate. A comprehensive experimental study was conducted on three datasets with different types of mismatch: (1) physical channel mismatch, (2) speaking style mismatch, (3) near-far recording mismatch. The results demonstrated that the proposed SD approach is highly effective, and outperforms the ad-hoc multi-condition training approach that is commonly adopted but not optimal in theory.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

CN-Celeb: multi-genre speaker recognition

448 - Lantian Li , Ruiqi Liu , Jiawen Kang 2020

Research on speaker recognition is extending to address the vulnerability in the wild conditions, among which genre mismatch is perhaps the most challenging, for instance, enrollment with reading speech while testing with conversational or singing au dio. This mismatch leads to complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) and extrinsic (i.e., recording device, background noise). Unfortunately, the few existing multi-genre corpora are not only limited in size but are also recorded under controlled conditions, which cannot support conclusive research on the multi-genre problem. In this work, we firstly publish CN-Celeb, a large-scale multi-genre corpus that includes in-the-wild speech utterances of 3,000 speakers in 11 different genres. Secondly, using this dataset, we conduct a comprehensive study on the multi-genre phenomenon, in particular the impact of the multi-genre challenge on speaker recognition, and on how to utilize the valuable multi-genre data more efficiently.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Speaker Recognition in Bengali Language from Nonlinear Features

76 - Uddalok Sarkar , Soumyadeep Pal , Sayan Nag 2020

At present Automatic Speaker Recognition system is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking style of a person, vocal tract information, t imbral qualities of his voice and other congenital information regarding his voice. The study of Bengali speech recognition and speaker identification is scarce in the literature. Hence the need arises for involving Bengali subjects in modelling our speaker identification engine. In this work, we have extracted some acoustic features of speech using non linear multifractal analysis. The Multifractal Detrended Fluctuation Analysis reveals essentially the complexity associated with the speech signals taken. The source characteristics have been quantified with the help of different techniques like Correlation Matrix, skewness of MFDFA spectrum etc. The Results obtained from this study gives a good recognition rate for Bengali Speakers.

أنظمة الصوت في الحاسوب الحساب واللغة معالجة الصوت والكلام

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الرشيد الدولية الخاصة للعلوم والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً