Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics

55 0 0.0 ( 0 )

Download Cite

Added by John Thickstun

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Vivek Jayaram - John Thickstun

Machine Learning Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper introduces an alternative approach to sampling from autoregressive models. Autoregressive models are typically sampled sequentially, according to the transition dynamics defined by the model. Instead, we propose a sampling procedure that initializes a sequence with white noise and follows a Markov chain defined by Langevin dynamics on the global log-likelihood of the sequence. This approach parallelizes the sampling process and generalizes to conditional sampling. Using an autoregressive model as a Bayesian prior, we can steer the output of a generative model using a conditional likelihood or constraints. We apply these techniques to autoregressive models in the visual and audio domains, with competitive results for audio source separation, super-resolution, and inpainting.

rate research

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

156 - Yilun Xu , Yang Song , Sahaj Garg 2021

Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models, however, does not allow interruptions and cannot adapt to real-time computational resources. This challenge impedes the deployment of powerful autoregressive models, which involve a slow sampling process that is sequential in nature and typically scales linearly with respect to the data dimension. To address this difficulty, we propose a new family of autoregressive models that enables anytime sampling. Inspired by Principal Component Analysis, we learn a structured representation space where dimensions are ordered based on their importance with respect to reconstruction. Using an autoregressive model in this latent space, we trade off sample quality for computational efficiency by truncating the generation process before decoding into the original data space. Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60% to 80% of all latent dimensions for image data. Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model .

Machine Learning

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

357 - Michael R. Zhang , Tom Le Paine , Ofir Nachum 2021

Standard dynamics models for continuous control make use of feedforward computation to predict the conditional distribution of next state and reward given current state and action using a multivariate Gaussian with a diagonal covariance structure. This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics. In this paper, we challenge this conditional independence assumption and propose a family of expressive autoregressive dynamics models that generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We demonstrate that autoregressive dynamics models indeed outperform standard feedforward models in log-likelihood on heldout transitions. Furthermore, we compare different model-based and model-free off-policy evaluation (OPE) methods on RL Unplugged, a suite of offline MuJoCo datasets, and find that autoregressive dynamics models consistently outperform all baselines, achieving a new state-of-the-art. Finally, we show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer through data augmentation and improving performance using model-based planning.

Machine Learning Artificial Intelligence Machine Learning

Accurate sampling using Langevin dynamics

445 - Giovanni Bussi , Michele Parrinello 2008

We show how to derive a simple integrator for the Langevin equation and illustrate how it is possible to check the accuracy of the obtained distribution on the fly, using the concept of effective energy introduced in a recent paper [J. Chem. Phys. 126, 014101 (2007)]. Our integrator leads to correct sampling also in the difficult high-friction limit. We also show how these ideas can be applied in practical simulations, using a Lennard-Jones crystal as a paradigmatic case.

Computational Physics Statistical Mechanics

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

68 - Joan Serr`a , Santiago Pascual , Carlos Segura 2019

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations. Voice conversion, in which a model has to impersonate a speaker in a recording, is one of those situations. In this paper, we propose Blow, a single-scale normalizing flow using hypernetwork conditioning to perform many-to-many voice conversion between raw audio. Blow is trained end-to-end, with non-parallel data, on a frame-by-frame basis using a single speaker identifier. We show that Blow compares favorably to existing flow-based architectures and other competitive baselines, obtaining equal or better performance in both objective and subjective evaluations. We further assess the impact of its main components with an ablation study, and quantify a number of properties such as the necessary amount of training data or the preference for source or target speakers.

Machine Learning Sound Audio and Speech Processing

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition

161 - Yan Gao , Titouan Parcollet , Nicholas Lane 2020

Knowledge distillation has been widely used to compress existing deep learning models while preserving the performance on a wide range of applications. In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acoustic models has recently shown promising results in increasing recognition performance. In this paper, we propose an extension of multi-teacher distillation methods to joint CTC-attention end-to-end ASR systems. We also introduce three novel distillation strategies. The core intuition behind them is to integrate the error rate metric to the teacher selection rather than solely focusing on the observed losses. In this way, we directly distill and optimize the student toward the relevant metric for speech recognition. We evaluate these strategies under a selection of training procedures on different datasets (TIMIT, Librispeech, Common Voice) and various languages (English, French, Italian). In particular, state-of-the-art error rates are reported on the Common Voice French, Italian and TIMIT datasets.

Machine Learning Sound Audio and Speech Processing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions