No Arabic abstract
Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative way to improve the lower-bound. In this paper, for the first time in the literature, we prove the necessity and effectiveness of learning the prior when aggregated posterior does not match unit Gaussian prior, analyze why this situation may happen, and propose a hypothesis that learning the prior may improve reconstruction loss, all of which are supported by our extensive experiment results. We show that using learned Real NVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures.
Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochastic variational inference, this provides a methodology scaling to large datasets. However, for this methodology to be practically efficient, it is necessary to obtain low-variance unbiased estimators of the ELBO and its gradients with respect to the parameters of interest. While the use of Markov chain Monte Carlo (MCMC) techniques such as Hamiltonian Monte Carlo (HMC) has been previously suggested to achieve this [23, 26], the proposed methods require specifying reverse kernels which have a large impact on performance. Additionally, the resulting unbiased estimator of the ELBO for most MCMC kernels is typically not amenable to the reparameterization trick. We show here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS) [17], we obtain a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick. This allows us to develop a Hamiltonian Variational Auto-Encoder (HVAE). This method can be reinterpreted as a target-informed normalizing flow [20] which, within our context, only requires a few evaluations of the gradient of the sampled likelihood and trivial Jacobian calculations at each iteration.
To act and plan in complex environments, we posit that agents should have a mental simulator of the world with three characteristics: (a) it should build an abstract state representing the condition of the world; (b) it should form a belief which represents uncertainty on the world; (c) it should go beyond simple step-by-step simulation, and exhibit temporal abstraction. Motivated by the absence of a model satisfying all these requirements, we propose TD-VAE, a generative sequence model that learns representations containing explicit beliefs about states several steps into the future, and that can be rolled out directly without single-step transitions. TD-VAE is trained on pairs of temporally separated time points, using an analogue of temporal difference learning used in reinforcement learning.
We prove that the evidence lower bound (ELBO) employed by variational auto-encoders (VAEs) admits non-trivial solutions having constant posterior variances under certain mild conditions, removing the need to learn variances in the encoder. The proof follows from an unexpected journey through an array of topics: the closed form optimal decoder for Gaussian VAEs, a proof that the decoder is always smooth, a proof that the ELBO at its stationary points is equal to the exact log evidence, and the posterior variance is merely part of a stochastic estimator of the decoder Hessian. The penalty incurred from using a constant posterior variance is small under mild conditions, and otherwise discourages large variations in the decoder Hessian. From here we derive a simplified formulation of the ELBO as an expectation over a batch, which we call the Batch Information Lower Bound (BILBO). Despite the use of Gaussians, our analysis is broadly applicable -- it extends to any likelihood function that induces a Riemannian metric. Regarding learned likelihoods, we show that the ELBO is optimal in the limit as the likelihood variances approach zero, where it is equivalent to the change of variables formulation employed in normalizing flow networks. Standard optimization procedures are unstable in this limit, so we propose a bounded Gaussian likelihood that is invariant to the scale of the data using a measure of the aggregate information in a batch, which we call Bounded Aggregate Information Sampling (BAGGINS). Combining the two formulations, we construct VAE networks with only half the outputs of ordinary VAEs (no learned variances), yielding improved ELBO scores and scale invariance in experiments. As we perform our analyses irrespective of any particular network architecture, our reformulations may apply to any VAE implementation.
Multi-Entity Dependence Learning (MEDL) explores conditional correlations among multiple entities. The availability of rich contextual information requires a nimble learning scheme that tightly integrates with deep neural networks and has the ability to capture correlation structures among exponentially many outcomes. We propose MEDL_CVAE, which encodes a conditional multivariate distribution as a generating process. As a result, the variational lower bound of the joint likelihood can be optimized via a conditional variational auto-encoder and trained end-to-end on GPUs. Our MEDL_CVAE was motivated by two real-world applications in computational sustainability: one studies the spatial correlation among multiple bird species using the eBird data and the other models multi-dimensional landscape composition and human footprint in the Amazon rainforest with satellite images. We show that MEDL_CVAE captures rich dependency structures, scales better than previous methods, and further improves on the joint likelihood taking advantage of very large datasets that are beyond the capacity of previous methods.
The increasing amount of data in astronomy provides great challenges for machine learning research. Previously, supervised learning methods achieved satisfactory recognition accuracy for the star-galaxy classification task, based on manually labeled data set. In this work, we propose a novel unsupervised approach for the star-galaxy recognition task, namely Cascade Variational Auto-Encoder (CasVAE). Our empirical results show our method outperforms the baseline model in both accuracy and stability.