Do you want to publish a course? Click here

On parameters transformations for emulating sparse priors using variational-Laplace inference

126   0   0.0 ( 0 )
 Added by Jean Daunizeau
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

So-called sparse estimators arise in the context of model fitting, when one a priori assumes that only a few (unknown) model parameters deviate from zero. Sparsity constraints can be useful when the estimation problem is under-determined, i.e. when number of model parameters is much higher than the number of data points. Typically, such constraints are enforced by minimizing the L1 norm, which yields the so-called LASSO estimator. In this work, we propose a simple parameter transform that emulates sparse priors without sacrificing the simplicity and robustness of L2-norm regularization schemes. We show how L1 regularization can be obtained with a sparsify remapping of parameters under normal Bayesian priors, and we demonstrate the ensuing variational Laplace approach using Monte-Carlo simulations.



rate research

Read More

119 - Jean Daunizeau 2017
Variational approaches to approximate Bayesian inference provide very efficient means of performing parameter estimation and model selection. Among these, so-called variational-Laplace or VL schemes rely on Gaussian approximations to posterior densities on model parameters. In this note, we review the main variants of VL approaches, that follow from considering nonlinear models of continuous and/or categorical data. En passant, we also derive a few novel theoretical results that complete the portfolio of existing analyses of variational Bayesian approaches, including investigations of their asymptotic convergence. We also suggest practical ways of extending existing VL approaches to hierarchical generative models that include (e.g., precision) hyperparameters.
We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the proper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alternative of Markov Chain Monte Carlo (MCMC) sampling. Meanwhile, comparing to conventional MCMC methods, the VB procedure achieves much higher computational efficiency, which greatly alleviates the computing burden for modern machine learning applications such as massive data analysis. Through numerical studies, we demonstrate that the proposed VB method leads to shorter computing time, higher estimation accuracy, and lower variable selection error than competitive sparse Bayesian methods.
Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep learning algorithms are lacking of theoretical support. On the other hand, another line of works have proposed theoretical frameworks that are computationally infeasible. In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors, and develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution. The variational posterior contraction rate is provided, which justifies the consistency of the proposed variational Bayes method. Notably, our empirical results demonstrate that this variational procedure provides uncertainty quantification in terms of Bayesian predictive distribution and is also capable to accomplish consistent variable selection by training a sparse multi-layer neural network.
We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. This work is motivated by recent analyses of the VAE objective, which pointed out that commonly used simple priors can lead to underfitting. As the distribution induced by LARS involves an intractable normalizing constant, we show how to estimate it and its gradients efficiently. We demonstrate that LARS priors improve VAE performance on several standard datasets both when they are learned jointly with the rest of the model and when they are fitted to a pretrained model. Finally, we show that LARS can be combined with existing methods for defining flexible priors for an additional boost in performance.
We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is (in essence) the log-likelihood, plus weight-decay, plus a squared-gradient regularizer. Variational Laplace gave better test performance and expected calibration errors than maximum a-posteriori inference and standard sampling-based variational inference, despite using the same variational approximate posterior. Finally, we emphasise care needed in benchmarking standard VI as there is a risk of stopping before the variance parameters have converged. We show that early-stopping can be avoided by increasing the learning rate for the variance parameters.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا