بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Assessing the intelligibility of vocoded speech using a remote testing framework

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kevin Chu

تاريخ النشر 2021

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Kevin M. Chu - Leslie M. Collins - Boyla O. Mainsah

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Over the past year, remote speech intelligibility testing has become a popular and necessary alternative to traditional in-person experiments due to the need for physical distancing during the COVID-19 pandemic. A remote framework was developed for conducting speech intelligibility tests with normal hearing listeners. In this study, subjects used their personal computers to complete sentence recognition tasks in anechoic and reverberant listening environments. The results obtained using this remote framework were compared with previously collected in-lab results, and showed higher levels of speech intelligibility among remote study participants than subjects who completed the test in the laboratory.

قيم البحث

152 - Haoyu Li , Junichi Yamagishi 2021

The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the equal-pow er constraint, i.e., signal power before and after modification must be the same. To achieve this, we use generative adversarial networks (GANs) to obtain time-frequency dependent amplification factors, which are then applied to the input raw speech to reallocate the speech energy. Instead of optimizing only a single, simple metric, we train a deep neural network (DNN) model to simultaneously optimize multiple advanced speech metrics, including both intelligibility- and quality-related ones, which results in notable improvements in performance and robustness. Our system can not only work in non-realtime mode for offline audio playback but also support practical real-time speech applications. Experimental results using both objective measurements and subjective listening tests indicate that the proposed system significantly outperforms state-ofthe-art baseline systems under various noisy and reverberant listening conditions.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

138 - Haoyu Li , Szu-Wei Fu , Yu Tsao 2020

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN approach to optimize the speech intelligibility metrics with generative adversarial networks (GANs). Experimental results show that the proposed iMetricGAN outperforms conventional state-of-the-art algorithms in terms of objective measures, i.e., speech intelligibility in bits (SIIB) and extended short-time objective intelligibility (ESTOI), under a Cafeteria noise condition. In addition, formal listening tests reveal significant intelligibility gains when both noise and reverberation exist.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References

232 - Pranay Manocha , Buye Xu , Anurag Kumar 2021

The perceptual task of speech quality assessment (SQA) is a challenging task for machines to do. Objective SQA methods that rely on the availability of the corresponding clean reference have been the primary go-to approaches for SQA. Clearly, these m ethods fail in real-world scenarios where the ground truth clean references are not available. In recent years, non-intrusive methods that train neural networks to predict ratings or scores have attracted much attention, but they suffer from several shortcomings such as lack of robustness, reliance on labeled data for training and so on. In this work, we propose a new direction for speech quality assessment. Inspired by humans innate ability to compare and assess the quality of speech signals even when they have non-matching contents, we propose a novel framework that predicts a subjective relative quality score for the given speech signal with respect to any provided reference without using any subjective data. We show that neural networks trained using our framework produce scores that correlate well with subjective mean opinion scores (MOS) and are also competitive to methods such as DNSMOS, which explicitly relies on MOS from humans for training networks. Moreover, our method also provides a natural way to embed quality-related information in neural networks, which we show is helpful for downstream tasks such as speech enhancement.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

Impulsive Noise Detection for Intelligibility and Quality Improvement of Speech Enhancement Methods Applied in Time-Domain

140 - C. Medina , R. Coelho 2019

This letter introduces a novel speech enhancement method in the Hilbert-Huang Transform domain to mitigate the effects of acoustic impulsive noises. The estimation and selection of noise components is based on the impulsiveness index of decomposition modes. Speech enhancement experiments are conducted considering five acoustic noises with different impulsiveness index and non-stationarity degrees under various signal-to-noise ratios. Three speech enhancement algorithms are adopted as baseline in the evaluation analysis considering spectral and time domains. The proposed solution achieves the best results in terms of objective quality measures and similar speech intelligibility rates to the competitive methods.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks

123 - Protima Nomo Sudro , Rohan Kumar Das , Rohit Sinha 2021

Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various speech-related disorders. As a result of structural and functional deformities, the affected subjects speech intelligibility is significantly degraded, limiti ng the accessibility and usability of speech-controlled devices. Towards addressing this problem, it is desirable to improve the CLP speech intelligibility. Moreover, it would be useful during speech therapy. In this study, the cycle-consistent adversarial network (CycleGAN) method is exploited for improving CLP speech intelligibility. The model is trained on native Kannada-speaking childrens speech data. The effectiveness of the proposed approach is also measured using automatic speech recognition performance. Further, subjective evaluation is performed, and those results also confirm the intelligibility improvement in the enhanced speech over the original.

معالجة الصوت والكلام

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Assessing the intelligibility of vocoded speech using a remote testing framework

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً