No Arabic abstract
Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates. We propose a novel rejection sampling step that discards samples from the variational posterior which are assigned low likelihoods by the model. Our approach provides an arbitrarily accurate approximation of the true posterior at the expense of extra computation. Using a new gradient estimator for the resulting unnormalized proposal distribution, we achieve average improvements of 3.71 nats and 0.21 nats over state-of-the-art single-sample and multi-sample alternatives respectively for estimating marginal log-likelihoods using sigmoid belief networks on the MNIST dataset.
We propose a novel adaptive importance sampling algorithm which incorporates Stein variational gradient decent algorithm (SVGD) with importance sampling (IS). Our algorithm leverages the nonparametric transforms in SVGD to iteratively decrease the KL divergence between our importance proposal and the target distribution. The advantages of this algorithm are twofold: first, our algorithm turns SVGD into a standard IS algorithm, allowing us to use standard diagnostic and analytic tools of IS to evaluate and interpret the results; second, we do not restrict the choice of our importance proposal to predefined distribution families like traditional (adaptive) IS methods. Empirical experiments demonstrate that our algorithm performs well on evaluating partition functions of restricted Boltzmann machines and testing likelihood of variational auto-encoders.
We formulate the problem of sampling and recovering clustered graph signal as a multi-armed bandit (MAB) problem. This formulation lends naturally to learning sampling strategies using the well-known gradient MAB algorithm. In particular, the sampling strategy is represented as a probability distribution over the individual arms of the MAB and optimized using gradient ascent. Some illustrative numerical experiments indicate that the sampling strategies based on the gradient MAB algorithm outperform existing sampling methods.
We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data.
Monte Carlo (MC) methods have become very popular in signal processing during the past decades. The adaptive rejection sampling (ARS) algorithms are well-known MC technique which draw efficiently independent samples from univariate target densities. The ARS schemes yield a sequence of proposal functions that converge toward the target, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computationally demanding each time it is updated. We propose the Parsimonious Adaptive Rejection Sampling (PARS) method, where an efficient trade-off between acceptance rate and proposal complexity is obtained. Thus, the resulting algorithm is faster than the standard ARS approach.
In this paper, we consider mixtures of two Mallows models for top-$k$ rankings, both with the same location parameter but with different scale parameters, i.e., a mixture of concentric Mallows models. This situation arises when we have a heterogeneous population of voters formed by two homogeneous populations, one of which is a subpopulation of expert voters while the other includes the non-expert voters. We propose efficient sampling algorithms for Mallows top-$k$ rankings. We show the identifiability of both components, and the learnability of their respective parameters in this setting by, first, bounding the sample complexity for the Borda algorithm with top-$k$ rankings and second, proposing polynomial time algorithm for the separation of the rankings in each component. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.