Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation


Abstract in English

Acoustic echo and background noise can seriously degrade the intelligibility of speech. In practice, echo and noise suppression are usually treated as two separated tasks and can be removed with various digital signal processing (DSP) and deep learning techniques. In this paper, we propose a new cascaded model, magnitude and complex temporal convolutional neural network (MC-TCN), to jointly perform acoustic echo cancellation and noise suppression with the help of adaptive filters. The MC-TCN cascades two separation cores, which are used to extract robust magnitude spectra feature and to enhance magnitude and phase simultaneously. Experimental results reveal that the proposed method can achieve superior performance by removing both echo and noise in real-time. In terms of DECMOS, the subjective test shows our method achieves a mean score of 4.41 and outperforms the INTERSPEECH2021 AEC-Challenge baseline by 0.54.

Download