ترغب بنشر مسار تعليمي؟ اضغط هنا

Parameter Tuning of Time-Frequency Masking Algorithms for Reverberant Artifact Removal within the Cochlear Implant Stimulus

66   0   0.0 ( 0 )
 نشر من قبل Lidea Shahidi
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Cochlear implant users struggle to understand speech in reverberant environments. To restore speech perception, artifacts dominated by reverberant reflections can be removed from the cochlear implant stimulus. Artifacts can be identified and removed by applying a matrix of gain values, a technique referred to as time-frequency masking. Gain values are determined by an oracle algorithm that uses knowledge of the undistorted signal to minimize retention of the signal components dominated by reverberant reflections. In practice, gain values are estimated from the distorted signal, with the oracle algorithm providing the estimation objective. Different oracle techniques exist for determining gain values, and each technique must be parameterized to set the amount of signal retention. This work assesses which oracle masking strategies and parameterizations lead to the best improvements in speech intelligibility for cochlear implant users in reverberant conditions using online speech intelligibility testing of normal-hearing individuals with vocoding.



قيم البحث

اقرأ أيضاً

Speech perception is key to verbal communication. For people with hearing loss, the capability to recognize speech is restricted, particularly in a noisy environment or the situations without visual cues, such as lip-reading unavailable via phone cal l. This study aimed to understand the improvement of vocoded speech intelligibility in cochlear implant (CI) simulation through two potential methods: Speech Enhancement (SE) and Audiovisual Integration. A fully convolutional neural network (FCN) using an intelligibility-oriented objective function was recently proposed and proven to effectively facilitate the speech intelligibility as an advanced denoising SE approach. Furthermore, audiovisual integration is reported to supply better speech comprehension compared to audio-only information. An experiment was designed to test speech intelligibility using tone-vocoded speech in CI simulation with a group of normal-hearing listeners. Experimental results confirmed the effectiveness of the FCN-based denoising SE and audiovisual integration on vocoded speech. Also, it positively recommended that these two methods could become a blended feature in a CI processor to improve the speech intelligibility for CI users under noisy conditions.
Attempts to develop speech enhancement algorithms with improved speech intelligibility for cochlear implant (CI) users have met with limited success. To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a co chlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli. We leverage a convolutional neural network (CNN) to extract both stationary and non-stationary components of environmental acoustics and speech. We propose three CNN architectures: (1) vanilla CNN that directly generates the enhanced signal; (2) spectral-subtraction-style CNN (SS-CNN) that first predicts noise and then generates the enhanced signal by subtracting noise from the noisy signal; (3) Wiener-style CNN (Wiener-CNN) that generates an optimal mask for suppressing noise. An important problem of the proposed networks is that they introduce considerable delays, which limits their real-time application for CI users. To address this, this study also considers causal variations of these networks. Our experiments show that the proposed networks (both causal and non-causal forms) achieve significant improvement over existing baseline systems. We also found that causal Wiener-CNN outperforms other networks, and leads to the best overall envelope coefficient measure (ECM). The proposed algorithms represent a viable option for implementation on the CCi-MOBILE research platform as a pre-processor for CI users in naturalistic environments.
Cochlear implant (CI) users have considerable difficulty in understanding speech in reverberant listening environments. Time-frequency (T-F) masking is a common technique that aims to improve speech intelligibility by multiplying reverberant speech b y a matrix of gain values to suppress T-F bins dominated by reverberation. Recently proposed mask estimation algorithms leverage machine learning approaches to distinguish between target speech and reverberant reflections. However, the spectro-temporal structure of speech is highly variable and dependent on the underlying phoneme. One way to potentially overcome this variability is to leverage explicit knowledge of phonemic information during mask estimation. This study proposes a phoneme-based mask estimation algorithm, where separate mask estimation models are trained for each phoneme. Sentence recognition tests were conducted in normal hearing listeners to determine whether a phoneme-based mask estimation algorithm is beneficial in the ideal scenario where perfect knowledge of the phoneme is available. The results showed that the phoneme-based masks improved the intelligibility of vocoded speech when compared to conventional phoneme-independent masks. The results suggest that a phoneme-based speech enhancement strategy may potentially benefit CI users in reverberant listening environments.
Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based system and alternative objectives like Scale Invariant Signal-to-Distortion Ratio by a large margin.
79 - Yiyuan Zhao 2019
The goals of this dissertation are to fully automate the image processing techniques needed in the post-operative stage of IGCIP and to perform a thorough analysis of (a) the robustness of the automatic image processing techniques used in IGCIP and ( b) assess the sensitivity of the IGCIP process as a whole to individual components. The automatic methods that have been developed include the automatic localization of both closely- and distantly-spaced CI electrode arrays in post-implantation CTs and the automatic selection of electrode configurations based on the stimulation patterns. Together with the existing automatic techniques developed for IGCIP, the proposed automatic methods enable an end-to-end IGCIP process that takes pre- and post-implantation CT images as input and produces a patient-customized electrode configuration as output.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا