ترغب بنشر مسار تعليمي؟ اضغط هنا

Tighter Variational Bounds are Not Necessarily Better

71   0   0.0 ( 0 )
 نشر من قبل Tom Rainforth
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.

قيم البحث

اقرأ أيضاً

201 - Laurence Aitchison 2019
Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lac k a key facet of the behaviour of real neural networks: the fixed kernel, determined only by network hyperparameters, implies that they cannot do any form of representation learning. The lack of representation or equivalently kernel learning leads to less flexibility and hence worse performance, giving a potential explanation for the inferior performance of infinite networks observed in the literature (e.g. Novak et al. 2019). We give analytic results characterising the prior over representations and representation learning in finite deep linear networks. We show empirically that the representations in SOTA architectures such as ResNets trained with SGD are much closer to those suggested by our deep linear results than by the corresponding infinite network. This motivates the introduction of a new class of network: infinite networks with bottlenecks, which inherit the theoretical tractability of infinite networks while at the same time allowing representation learning.
Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the mi nimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way of improving the variational approximation. Perturbation theory relies on a form of Taylor expansion of the log marginal likelihood, vaguely in terms of the log ratio of the true posterior and its variational approximation. While first order terms give the classical variational bound, higher-order terms yield corrections that tighten it. However, traditional perturbation theory does not provide a lower bound, making it inapt for stochastic optimization. In this paper, we present a similar yet alternative way of deriving corrections to the ELBO that resemble perturbation theory, but that result in a valid bound. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.
We use the stellar-mass-selected catalog from the Spitzer Large Area Survey with Hyper-Suprime-Cam (SPLASH) in the COSMOS field to study the environments of galaxies via galaxy density and clustering analyses up to $z sim 2.5$. The clustering strengt h of quiescent galaxies exceeds that of star-forming galaxies, implying that quiescent galaxies are preferentially located in more massive halos. When using local density measurement, we find a clear positive quiescent fraction--density relation at $z < 1$, consistent with earlier results. However, the quiescent fraction--density relation reverses its trend at intermediate redshifts ($1 < z < 1.5$) with marginal significance (<1.8$sigma$) and is found to be scale dependent (1.6$sigma$). The lower fraction of quiescent galaxies seen in large-scale dense environments, if confirmed to be true, may be associated with the fact that the star formation can be more easily sustained via cold stream accretion in `large-scale high-density regions, preventing galaxies from permanent quenching. Finally, at $z > 1.5$, the quiescent fraction depends little on the local density, even though clustering shows that quiescent galaxies are in more massive halos. We argue that at high redshift the typical halo size falls below $10^{13}$ $M_{odot}$, where intrinsically the local density measurements are so varied that they do not trace the halo mass. Our results thus suggest that in the high-redshift Universe, halo mass may be the key in quenching the star formation in galaxies, rather than the conventionally measured galaxy density.
243 - Tianyi Chen , Yuejiao Sun , 2021
Stochastic nested optimization, including stochastic compositional, min-max and bilevel optimization, is gaining popularity in many machine learning applications. While the three problems share the nested structure, existing works often treat them se parately, and thus develop problem-specific algorithms and their analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have slower convergence rate compared to that of the non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an $epsilon$-stationary point of the nested problem, it requires ${cal O}(epsilon^{-2})$ samples. Under certain regularity conditions, applying our results to stochastic compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.
Networks with a scale-free degree distribution are widely thought to promote cooperation in various games. Herein, by studying the well-known prisoners dilemma game, we demonstrate that this need not necessarily be true. For the very same degree sequ ence and degree distribution, we present a variety of possible behaviour. We reassess the perceived importance of hubs in a network towards the maintenance of cooperation. We also reevaluate the dependence of cooperation on network clustering and assortativity.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا