No Arabic abstract
Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smoothing posteriors. Yet, we show that the ELBO objective forces partially-conditioned amortised posteriors to approximate products of smoothing posteriors instead. Consequently, the learned generative model is compromised. We demonstrate these theoretical findings in three scenarios: traffic flow, handwritten digits, and aerial vehicle dynamics. Using fully-conditioned approximate posteriors, performance improves in terms of generative modelling and multi-step prediction.
We propose an approach for improving sequence modeling based on autoregressive normalizing flows. Each autoregressive transform, acting across time, serves as a moving frame of reference, removing temporal correlations, and simplifying the modeling of higher-level dynamics. This technique provides a simple, general-purpose method for improving sequence modeling, with connections to existing and classical techniques. We demonstrate the proposed approach both with standalone flow-based models and as a component within sequential latent variable models. Results are presented on three benchmark video datasets, where autoregressive flow-based dynamics improve log-likelihood performance over baseline models. Finally, we illustrate the decorrelation and improved generalization properties of using flow-based dynamics.
The recurrent neural networks (RNN) with richly distributed internal states and flexible non-linear transition functions, have overtaken the dynamic Bayesian networks such as the hidden Markov models (HMMs) in the task of modeling highly structured sequential data. These data, such as from speech and handwriting, often contain complex relationships between the underlaying variational factors and the observed data. The standard RNN model has very limited randomness or variability in its structure, coming from the output conditional probability model. This paper will present different ways of using high level latent random variables in RNN to model the variability in the sequential data, and the training method of such RNN model under the VAE (Variational Autoencoder) principle. We will explore possible ways of using adversarial method to train a variational RNN model. Contrary to competing approaches, our approach has theoretical optimum in the model training and provides better model training stability. Our approach also improves the posterior approximation in the variational inference network by a separated adversarial training step. Numerical results simulated from TIMIT speech data show that reconstruction loss and evidence lower bound converge to the same level and adversarial training loss converges to 0.
Temporal Point Processes (TPP) with partial likelihoods involving a latent structure often entail an intractable marginalization, thus making inference hard. We propose a novel approach to Maximum Likelihood Estimation (MLE) involving approximate inference over the latent variables by minimizing a tight upper bound on the approximation gap. Given a discrete latent variable $Z$, the proposed approximation reduces inference complexity from $O(|Z|^c)$ to $O(|Z|)$. We use convex conjugates to determine this upper bound in a closed form and show that its addition to the optimization objective results in improved results for models assuming proportional hazards as in Survival Analysis.
Standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions.
How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data, exploring the relationship between fair classification and intervention. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help learn policies that are both more accurate and fair, when presented with a historically biased dataset.