ﻻ يوجد ملخص باللغة العربية
In this work, we tackle a denoising and dereverberation problem with a single-stage framework. Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware beta-sigmoid mask (PHM), which reuses the estimated magnitude values to estimate the clean phase by respecting the triangle inequality in the complex domain between three signal components such as mixture, source and the rest. Two PHMs are used to deal with direct and reverberant source, which allows controlling the proportion of reverberation in the enhanced speech at inference time. In addition, to improve the speech enhancement performance, we propose a new time-domain loss function and show a reasonable performance gain compared to MSE loss in the complex domain. Finally, to achieve a real-time inference, an optimization strategy for U-Net is proposed which significantly reduces the computational overhead up to 88.9% compared to the naive version.
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks. The number of parameters of state-of-the-art models, however, is often too large to be deployed on devices for real-world applications. To
Despite successful applications of end-to-end approaches in multi-channel speech recognition, the performance still degrades severely when the speech is corrupted by reverberation. In this paper, we integrate the dereverberation module into the end-t
Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions. However, severe performance degradation is still observed in the reverberant and no
The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the
Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dere