Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

125 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ayoub El Hanchi

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ayoub El Hanchi - David A. Stephens

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret were designed. In this work, we build on this framework and propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $mathcal{O}(T^{2/3})$ and $mathcal{O}(T^{5/6})$ dynamic regret for SGD and SGLD respectively when run with $mathcal{O}(1/t)$ step sizes. We achieve this dynamic regret bound by leveraging our knowledge of the dynamics defined by the algorithm, and combining ideas from online learning and variance-reduced stochastic optimization. We validate empirically the performance of our algorithm and identify settings in which it leads to significant improvements.

قيم البحث

اقرأ أيضاً

Layered Adaptive Importance Sampling

241 - L. Martino , V. Elvira , D. Luengo 2015

Monte Carlo methods represent the de facto standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler pro posal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.

حساب التعلم الآلي التعلم الالي

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

86 - Xiaoyu Wang , Mikael Johansson 2021

Many popular learning-rate schedules for deep neural networks combine a decaying trend with local perturbations that attempt to escape saddle points and bad local minima. We derive convergence guarantees for bandwidth-based step-sizes, a general clas s of learning-rates that are allowed to vary in a banded region. This framework includes cyclic and non-monotonic step-sizes for which no theoretical guarantees were previously known. We provide worst-case guarantees for SGD on smooth non-convex problems under several bandwidth-based step sizes, including stagewise $1/sqrt{t}$ and the popular step-decay (constant and then drop by a constant), which is also shown to be optimal. Moreover, we show that its momentum variant (SGDM) converges as fast as SGD with the bandwidth-based step-decay step-size. Finally, we propose some novel step-size schemes in the bandwidth-based family and verify their efficiency on several deep neural network training tasks.

التعلم الآلي التحسين والتحكم

Faster Coordinate Descent via Adaptive Importance Sampling

157 - Dmytro Perekrestenko , Volkan Cevher , Martin Jaggi 2017

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean th at our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التحسين والتحكم

Adaptive Multiple Importance Sampling

132 - Jean-Marie Cornuet 2009

The Adaptive Multiple Importance Sampling (AMIS) algorithm is aimed at an optimal recycling of past simulations in an iterated importance sampling scheme. The difference with earlier adaptive importance sampling implementations like Population Monte Carlo is that the importance weights of all simulated values, past as well as present, are recomputed at each iteration, following the technique of the deterministic multiple mixture estimator of Owen and Zhou (2000). Although the convergence properties of the algorithm cannot be fully investigated, we demonstrate through a challenging banana shape target distribution and a population genetics example that the improvement brought by this technique is substantial.

حساب تطبيقات الإحصاء

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis

183 - Yicheng Luo , Antonio Filieri , Yuan Zhou 2020

Probabilistic software analysis aims at quantifying the probability of a target event occurring during the execution of a program processing uncertain incoming data or written itself using probabilistic programming constructs. Recent techniques combi ne symbolic execution with model counting or solution space quantification methods to obtain accurate estimates of the occurrence probability of rare target events, such as failures in a mission-critical system. However, they face several scalability and applicability limitations when analyzing software processing with high-dimensional and correlated multivariate input distributions. In this paper, we present SYMbolic Parallel Adaptive Importance Sampling (SYMPAIS), a new inference method tailored to analyze path conditions generated from the symbolic execution of programs with high-dimensional, correlated input distributions. SYMPAIS combines results from importance sampling and constraint solving to produce accurate estimates of the satisfaction probability for a broad class of constraints that cannot be analyzed by current solution space quantification methods. We demonstrate SYMPAISs generality and performance compared with state-of-the-art alternatives on a set of problems from different application domains.

التعلم الآلي الذكاء الاصطناعي لغات البرمجة