ﻻ يوجد ملخص باللغة العربية
We prove that the evidence lower bound (ELBO) employed by variational auto-encoders (VAEs) admits non-trivial solutions having constant posterior variances under certain mild conditions, removing the need to learn variances in the encoder. The proof follows from an unexpected journey through an array of topics: the closed form optimal decoder for Gaussian VAEs, a proof that the decoder is always smooth, a proof that the ELBO at its stationary points is equal to the exact log evidence, and the posterior variance is merely part of a stochastic estimator of the decoder Hessian. The penalty incurred from using a constant posterior variance is small under mild conditions, and otherwise discourages large variations in the decoder Hessian. From here we derive a simplified formulation of the ELBO as an expectation over a batch, which we call the Batch Information Lower Bound (BILBO). Despite the use of Gaussians, our analysis is broadly applicable -- it extends to any likelihood function that induces a Riemannian metric. Regarding learned likelihoods, we show that the ELBO is optimal in the limit as the likelihood variances approach zero, where it is equivalent to the change of variables formulation employed in normalizing flow networks. Standard optimization procedures are unstable in this limit, so we propose a bounded Gaussian likelihood that is invariant to the scale of the data using a measure of the aggregate information in a batch, which we call Bounded Aggregate Information Sampling (BAGGINS). Combining the two formulations, we construct VAE networks with only half the outputs of ordinary VAEs (no learned variances), yielding improved ELBO scores and scale invariance in experiments. As we perform our analyses irrespective of any particular network architecture, our reformulations may apply to any VAE implementation.
Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the
To act and plan in complex environments, we posit that agents should have a mental simulator of the world with three characteristics: (a) it should build an abstract state representing the condition of the world; (b) it should form a belief which rep
The increasing amount of data in astronomy provides great challenges for machine learning research. Previously, supervised learning methods achieved satisfactory recognition accuracy for the star-galaxy classification task, based on manually labeled
Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative w
Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the