Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Variational f-divergence Minimization

131 0 0.0 ( 0 )

Download Cite

Added by Mingtian Zhang

Publication date 2019

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Mingtian Zhang - Thomas Bird - Raza Habib

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Probabilistic models are often trained by maximum likelihood, which corresponds to minimizing a specific f-divergence between the model and data distribution. In light of recent successes in training Generative Adversarial Networks, alternative non-likelihood training criteria have been proposed. Whilst not necessarily statistically efficient, these alternatives may better match user requirements such as sharp image generation. A general variational method for training probabilistic latent variable models using maximum likelihood is well established; however, how to train latent variable models using other f-divergences is comparatively unknown. We discuss a variational approach that, when combined with the recently introduced Spread Divergence, can be applied to train a large class of latent variable models using any f-divergence.

rate research

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

87 - Sebastian Nowozin , Botond Cseke , Ryota Tomioka 2016

Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.

Machine Learning Machine Learning Methodology

Variational approximations using Fisher divergence

146 - Yue Yang , Ryan Martin , Howard Bondell 2019

Modern applications of Bayesian inference involve models that are sufficiently complex that the corresponding posterior distributions are intractable and must be approximated. The most common approximation is based on Markov chain Monte Carlo, but these can be expensive when the data set is large and/or the model is complex, so more efficient variational approximations have recently received considerable attention. The traditional variational methods, that seek to minimize the Kullback--Leibler divergence between the posterior and a relatively simple parametric family, provide accurate and efficient estimation of the posterior mean, but often does not capture other moments, and have limitations in terms of the models to which they can be applied. Here we propose the construction of variational approximations based on minimizing the Fisher divergence, and develop an efficient computational algorithm that can be applied to a wide range of models without conjugacy or potentially unrealistic mean-field assumptions. We demonstrate the superior performance of the proposed method for the benchmark case of logistic regression.

Machine Learning Machine Learning Computation

Neural Sequence Model Training via $alpha$-divergence Minimization

137 - Sotetsu Koyamada , Yuta Kikuchi , Atsunori Kanemura 2017

We propose a new neural sequence model training method in which the objective function is defined by $alpha$-divergence. We demonstrate that the objective function generalizes the maximum-likelihood (ML)-based and reinforcement learning (RL)-based objective functions as special cases (i.e., ML corresponds to $alpha to 0$ and RL to $alpha to1$). We also show that the gradient of the objective function can be considered a mixture of ML- and RL-based objective gradients. The experimental results of a machine translation task show that minimizing the objective function with $alpha > 0$ outperforms $alpha to 0$, which corresponds to ML-based methods.

Machine Learning Machine Learning

Multiview Learning of Weighted Majority Vote by Bregman Divergence Minimization

318 - Anil Goyal 2018

We tackle the issue of classifier combinations when observations have multiple views. Our method jointly learns view-specific weighted majority vote classifiers (i.e. for each view) over a set of base voters, and a second weighted majority vote classifier over the set of these view-specific weighted majority vote classifiers. We show that the empirical risk minimization of the final majority vote given a multiview training set can be cast as the minimization of Bregman divergences. This allows us to derive a parallel-update optimization algorithm for learning our multiview model. We empirically study our algorithm with a particular focus on the impact of the training set size on the multiview learning results. The experiments show that our approach is able to overcome the lack of labeled information.

Machine Learning Machine Learning

Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence

115 - Ghassen Jerfel , Serena Wang , Clara Fannjiang 2021

Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC.

Machine Learning Machine Learning Computation

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Variational f-divergence Minimization

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions