ﻻ يوجد ملخص باللغة العربية
Variational autoencoders optimize an objective that combines a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate also acts as an inductive bias that improves generalization. We perform rate-distortion analyses that control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Decreasing the strength of the rate paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Moreover, we show that generalization continues to improve even after the mutual information saturates, indicating that the gap on the bound (i.e. the KL divergence relative to the inference marginal) affects generalization. This suggests that the standard Gaussian prior is not an inductive bias that typically aids generalization, prompting work to understand what choices of priors improve generalization in VAEs.
A crucial aspect in reliable machine learning is to design a deployable system in generalizing new related but unobserved environments. Domain generalization aims to alleviate such a prediction gap between the observed and unseen environments. Previo
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to l
Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often ref
An implicit goal in works on deep generative models is that such models should be able to generate novel examples that were not previously seen in the training data. In this paper, we investigate to what extent this property holds for widely employed