ترغب بنشر مسار تعليمي؟ اضغط هنا

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

91   0   0.0 ( 0 )
 نشر من قبل Guodong Zhang
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings (MH) correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose a differentiable AIS algorithm by abandoning MH steps, which further unlocks mini-batch computation. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for non-perfect transitions. Using this analysis, we prove that our algorithm is consistent in the full-batch setting and provide a sublinear convergence rate. However, we show that the algorithm is inconsistent when mini-batch gradients are used due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the pathwise stochastic error. This result is in stark contrast to our experience with stochastic optimization and stochastic gradient Langevin dynamics, where the effects of gradient noise can be washed out by taking more steps of a smaller size. Our negative result relies crucially on our explicit consideration of convergence to the stationary distribution, and it helps explain the difficulty of developing practically effective AIS-like algorithms that exploit mini-batch gradients.



قيم البحث

اقرأ أيضاً

We consider estimating the marginal likelihood in settings with independent and identically distributed (i.i.d.) data. We propose estimating the predictive distributions in a sequential factorization of the marginal likelihood in such settings by usi ng stochastic gradient Markov Chain Monte Carlo techniques. This approach is far more efficient than traditional marginal likelihood estimation techniques such as nested sampling and annealed importance sampling due to its use of mini-batches to approximate the likelihood. Stability of the estimates is provided by an adaptive annealing schedule. The resulting stochastic gradient annealed importance sampling (SGAIS) technique, which is the key contribution of our paper, enables us to estimate the marginal likelihood of a number of models considerably faster than traditional approaches, with no noticeable loss of accuracy. An important benefit of our approach is that the marginal likelihood is calculated in an online fashion as data becomes available, allowing the estimates to be used for applications such as online weighted model combination.
Annealed importance sampling is a means to assign equilibrium weights to a nonequilibrium sample that was generated by a simulated annealing protocol. The weights may then be used to calculate equilibrium averages, and also serve as an ``adiabatic si gnature of the chosen cooling schedule. In this paper we demonstrate the method on the 50-atom dileucine peptide, showing that equilibrium distributions are attained for manageable cooling schedules. For this system, as naively implemented here, the method is modestly more efficient than constant temperature simulation. However, the method is worth considering whenever any simulated heating or cooling is performed (as is often done at the beginning of a simulation project, or during an NMR structure calculation), as it is simple to implement and requires minimal additional CPU expense. Furthermore, the naive implementation presented here can be improved.
105 - Clement Gauchy , Cyril Feau , 2021
As part of Probabilistic Risk Assessment studies, it is necessary to study the fragility of mechanical and civil engineered structures when subjected to seismic loads. This risk can be measured with fragility curves, which express the probability of failure of the structure conditionally to a seismic intensity measure. The estimation of fragility curves relies on time-consuming numerical simulations, so that careful experimental design is required in order to gain the maximum information on the structures fragility with a limited number of code evaluations. We propose and implement an active learning methodology based on adaptive importance sampling in order to reduce the variance of the training loss. The efficiency of the proposed method in terms of bias, standard deviation and prediction interval coverage are theoretically and numerically characterized.
Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC.
Classification has been a major task for building intelligent systems as it enables decision-making under uncertainty. Classifier design aims at building models from training data for representing feature-label distributions--either explicitly or imp licitly. In many scientific or clinical settings, training data are typically limited, which makes designing accurate classifiers and evaluating their classification error extremely challenging. While transfer learning (TL) can alleviate this issue by incorporating data from relevant source domains to improve learning in a different target domain, it has received little attention for performance assessment, notably in error estimation. In this paper, we fill this gap by investigating knowledge transferability in the context of classification error estimation within a Bayesian paradigm. We introduce a novel class of Bayesian minimum mean-square error (MMSE) estimators for optimal Bayesian transfer learning (OBTL), which enables rigorous evaluation of classification error under uncertainty in a small-sample setting. Using Monte Carlo importance sampling, we employ the proposed estimator to evaluate the classification accuracy of a broad family of classifiers that span diverse learning capabilities. Experimental results based on both synthetic data as well as real-world RNA sequencing (RNA-seq) data show that our proposed OBTL error estimation scheme clearly outperforms standard error estimators, especially in a small-sample setting, by tapping into the data from other relevant domains.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا