ﻻ يوجد ملخص باللغة العربية
Deep convolutional neural networks are known to specialize in distilling compact and robust prior from a large amount of data. We are interested in applying deep networks in the absence of training dataset. In this paper, we introduce deep audio prior (DAP) which leverages the structure of a network and the temporal information in a single audio file. Specifically, we demonstrate that a randomly-initialized neural network can be used with carefully designed audio prior to tackle challenging audio problems such as universal blind source separation, interactive audio editing, audio texture synthesis, and audio co-separation. To understand the robustness of the deep audio prior, we construct a benchmark dataset emph{Universal-150} for universal sound source separation with a diverse set of sources. We show superior audio results than previous work on both qualitative and quantitative evaluations. We also perform thorough ablation study to validate our design choices.
Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture. Compared to conventional approaches that represent clean sp
Existing automatic music generation approaches that feature deep learning can be broadly classified into two types: raw audio models and symbolic models. Symbolic models, which train and generate at the note level, are currently the more prevalent ap
Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elu
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. Autoregressive models, such as WaveNet, model local structure at the expense of g
In this paper, we describe our contribution to Task 2 of the DCASE 2018 Audio Challenge. While it has become ubiquitous to utilize an ensemble of machine learning methods for classification tasks to obtain better predictive performance, the majority