ترغب بنشر مسار تعليمي؟ اضغط هنا

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference

81   0   0.0 ( 0 )
 نشر من قبل Geoffrey Roeder
 تاريخ النشر 2017
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with respect to the variational parameters that corresponds to the score function. Removing this term produces an unbiased gradient estimator whose variance approaches zero as the approximate posterior approaches the exact posterior. We analyze the behavior of this gradient estimator theoretically and empirically, and generalize it to more complex variational distributions such as mixtures and importance-weighted posteriors.



قيم البحث

اقرأ أيضاً

The maximum mean discrepancy (MMD) is a kernel-based distance between probability distributions useful in many applications (Gretton et al. 2012), bearing a simple estimator with pleasing computational and statistical properties. Being able to effici ently estimate the variance of this estimator is very helpful to various problems in two-sample testing. Towards this end, Bounliphone et al. (2016) used the theory of U-statistics to derive estimators for the variance of an MMD estimator, and differences between two such estimators. Their estimator, however, drops lower-order terms, and is unnecessarily biased. We show in this note - extending and correcting work of Sutherland et al. (2017) - that we can find a truly unbiased estimator for the actual variance of both the squared MMD estimator and the difference of two correlated squared MMD estimators, at essentially no additional computational cost.
Efficient low-variance gradient estimation enabled by the reparameterization trick (RT) has been essential to the success of variational autoencoders. Doubly-reparameterized gradients (DReGs) improve on the RT for multi-sample variational bounds by a pplying reparameterization a second time for an additional reduction in variance. Here, we develop two generalizations of the DReGs estimator and show that they can be used to train conditional and hierarchical VAEs on image modelling tasks more effectively. First, we extend the estimator to hierarchical models with several stochastic layers by showing how to treat additional score function terms due to the hierarchical variational posterior. We then generalize DReGs to score functions of arbitrary distributions instead of just those of the sampling distribution, which makes the estimator applicable to the parameters of the prior in addition to those of the posterior.
239 - Jun Han , Qiang Liu 2018
Stein variational gradient decent (SVGD) has been shown to be a powerful approximate inference algorithm for complex distributions. However, the standard SVGD requires calculating the gradient of the target density and cannot be applied when the grad ient is unavailable. In this work, we develop a gradient-free variant of SVGD (GF-SVGD), which replaces the true gradient with a surrogate gradient, and corrects the induced bias by re-weighting the gradients in a proper form. We show that our GF-SVGD can be viewed as the standard SVGD with a special choice of kernel, and hence directly inherits the theoretical properties of SVGD. We shed insights on the empirical choice of the surrogate gradient and propose an annealed GF-SVGD that leverages the idea of simulated annealing to improve the performance on high dimensional complex distributions. Empirical studies show that our method consistently outperforms a number of recent advanced gradient-free MCMC methods.
Boosting variational inference (BVI) approximates an intractable probability density by iteratively building up a mixture of simple component distributions one at a time, using techniques from sparse convex optimization to provide both computational scalability and approximation error guarantees. But the guarantees have strong conditions that do not often hold in practice, resulting in degenerate component optimization problems; and we show that the ad-hoc regularization used to prevent degeneracy in practice can cause BVI to fail in unintuitive ways. We thus develop universal boosting variational inference (UBVI), a BVI scheme that exploits the simple geometry of probability densities under the Hellinger metric to prevent the degeneracy of other gradient-based BVI methods, avoid difficult joint optimizations of both component and weight, and simplify fully-corrective weight optimizations. We show that for any target density and any mixture component family, the output of UBVI converges to the best possible approximation in the mixture family, even when the mixture family is misspecified. We develop a scalable implementation based on exponential family mixture components and standard stochastic optimization techniques. Finally, we discuss statistical benefits of the Hellinger distance as a variational objective through bounds on posterior probability, moment, and importance sampling errors. Experiments on multiple datasets and models show that UBVI provides reliable, accurate posterior approximations.
Many computationally-efficient methods for Bayesian deep learning rely on continuous optimization algorithms, but the implementation of these methods requires significant changes to existing code-bases. In this paper, we propose Vprop, a method for G aussian variational inference that can be implemented with two minor changes to the off-the-shelf RMSprop optimizer. Vprop also reduces the memory requirements of Black-Box Variational Inference by half. We derive Vprop using the conjugate-computation variational inference method, and establish its connections to Newtons method, natural-gradient methods, and extended Kalman filters. Overall, this paper presents Vprop as a principled, computationally-efficient, and easy-to-implement method for Bayesian deep learning.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا