Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Frechet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms

203 0 0.0 ( 0 )

Download Cite

Added by Kevin Kilgour

Publication date 2018

fields Electronic Engineering Informatics Engineering

and research's language is English

Authors Kevin Kilgour - Mauricio Zuluaga - Dominik Roblek

Audio and Speech Processing Sound

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We propose the Frechet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Frechet Inception Distance (FID) metric used to evaluate generative image models to the audio domain. FAD is validated using a wide variety of artificial distortions and is compared to the signal based metrics signal to distortion ratio (SDR), cosine distance and magnitude L2 distance. We show that, with a correlation coefficient of 0.52, FAD correlates more closely with human perception than either SDR, cosine distance or magnitude L2 distance, with correlation coefficients of 0.39, -0.15 and -0.01 respectively.

rate research

DPLM: A Deep Perceptual Spatial-Audio Localization Metric

157 - Pranay Manocha , Anurag Kumar , Buye Xu 2021

Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general purpose quality metric to assess spatial localization differences between two binaural recordings. We model localization similarity by utilizing activation-level distances from deep networks trained for direction of arrival (DOA) estimation. Our proposed metric (DPLM) outperforms baseline metrics on correlation with subjective ratings on a diverse set of datasets, even without the benefit of any human-labeled training data.

Audio and Speech Processing Sound

Audio declipping performance enhancement via crossfading

151 - Pavel Zaviv{s}ka , Pavel Rajmic , Ondv{r}ej Mokry 2021

Some audio declipping methods produce waveforms that do not fully respect the physical process of clipping, which is why we refer to them as inconsistent. This letter reports what effect on perception it has if the solution by inconsistent methods is forced consistent by postprocessing. We first propose a simple sample replacement method, then we identify its main weaknesses and propose an improved variant. The experiments show that the vast majority of inconsistent declipping methods significantly benefit from the proposed approach in terms of objective perceptual metrics. In particular, we show that the SS PEW method based on social sparsity combined with the proposed method performs comparable to top methods from the consistent class, but at a computational cost of one order of magnitude lower.

Audio and Speech Processing Sound

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

447 - Pranay Manocha , Adam Finkelstein , Richard Zhang 2020

Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments. Subjects are prompted to answer a straightforward, objective question: are two recordings identical or not? These pairs are algorithmically generated under a variety of perturbations, including noise, reverb, and compression artifacts; the perturbation space is probed with the goal of efficiently identifying the just-noticeable difference (JND) level of the subject. We show that the resulting learned metric is well-calibrated with human judgments, outperforming baseline methods. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for other tasks. Thus, simply replacing an existing loss (e.g., deep feature loss) with our metric yields significant improvement in a denoising network, as measured by subjective pairwise comparison.

Audio and Speech Processing Sound

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

64 - Sneha Das , Tom Backstrom 2020

Enhancement algorithms for wireless acoustics sensor networks~(WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs approximate quantization noise with an additive Gaussian distribution, which limits performance due to the non-linear nature of quantization noise at lower bitrates. In this work, we propose a postfilter for enhancement based on Bayesian statistics to obtain a multidevice signal estimate, which explicitly models the quantization noise. Our experiments using PSNR, PESQ and MUSHRA scores demonstrate that the proposed postfilter can be used to enhance signal quality in ad-hoc sensor networks.

Audio and Speech Processing Sound

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

106 - Michael Chinen , Felicia S. C. Lim , Jan Skoglund 2020

Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previo

Audio and Speech Processing Sound Signal Processing

comments

Fetching comments

Aِl-Baath University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Frechet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms

Ask ChatGPT about the research

No Arabic abstract

Read More