ﻻ يوجد ملخص باللغة العربية
In this study, we present a deep learning-based speech signal-processing mobile application, called CITISEN, which can perform three functions: speech enhancement (SE), model adaptation (MA), and acoustic scene conversion (ASC). For SE, CITISEN can effectively reduce noise components from speech signals and accordingly enhance their clarity and intelligibility. When it encounters noisy utterances with unknown speakers or noise types, the MA function allows CITISEN to effectively improve the SE performance by adapting an SE model with a few audio files. Finally, for ASC, CITISEN can convert the current background sound into a different background sound. The experimental results confirmed the effectiveness of performing SE, MA, and ASC functions via objective evaluation and subjective listening tests. Moreover, the MA experimental results indicated that short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) could be improved by approximately 5% and 10%, respectively. The promising results reveal that the developed CITISEN mobile application can be potentially used as a front-end processor for various speech-related services such as voice communication, assistive hearing devices, and virtual reality headsets. In addition, CITISEN can be used as a platform for using and evaluating the newly performed deep-learning-SE models, and can flexibly extend the models to address various noise environments and users.
Speech-related applications deliver inferior performance in complex noise environments. Therefore, this study primarily addresses this problem by introducing speech-enhancement (SE) systems based on deep neural networks (DNNs) applied to a distribute
Most recent studies on deep learning based speech enhancement (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in real scena
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to pe
The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Speci
The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry spee