ﻻ يوجد ملخص باللغة العربية
While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and processing time. Through experiments conducted with real user data, we demonstrate that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x). However, their performance impact is low (< 3.14%) only in the case of classification tasks such as those present in voice activity detection.
Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of
With the development of deep learning and artificial intelligence, audio synthesis has a pivotal role in the area of machine learning and shows strong applicability in the industry. Meanwhile, significant efforts have been dedicated by researchers to
In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiment
Enhancement algorithms for wireless acoustics sensor networks~(WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs approximate q
Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propo