ترغب بنشر مسار تعليمي؟ اضغط هنا

Stochastic Gradient Langevin Dynamics with Variance Reduction

79   0   0.0 ( 0 )
 نشر من قبل Zhishen Huang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Stochastic gradient Langevin dynamics (SGLD) has gained the attention of optimization researchers due to its global optimization properties. This paper proves an improved convergence property to local minimizers of nonconvex objective functions using SGLD accelerated by variance reductions. Moreover, we prove an ergodicity property of the SGLD scheme, which gives insights on its potential to find global minimizers of nonconvex objectives.



قيم البحث

اقرأ أيضاً

Alternating Direction Method of Multipliers (ADMM) is a popular method in solving Machine Learning problems. Stochastic ADMM was firstly proposed in order to reduce the per iteration computational complexity, which is more suitable for big data probl ems. Recently, variance reduction techniques have been integrated with stochastic ADMM in order to get a fast convergence rate, such as SAG-ADMM and SVRG-ADMM,but the convergence is still suboptimal w.r.t the smoothness constant. In this paper, we propose a new accelerated stochastic ADMM algorithm with variance reduction, which enjoys a faster convergence than all the other stochastic ADMM algorithms. We theoretically analyze its convergence rate and show its dependence on the smoothness constant is optimal. We also empirically validate its effectiveness and show its priority over other stochastic ADMM algorithms.
In recent years, stochastic variance reduction algorithms have attracted considerable attention for minimizing the average of a large but finite number of loss functions. This paper proposes a novel Riemannian extension of the Euclidean stochastic va riance reduced gradient (R-SVRG) algorithm to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are addressed with retraction and vector transport. For the proposed algorithm, we present a global convergence analysis with a decaying step size as well as a local convergence rate analysis with a fixed step size under some natural assumptions. In addition, the proposed algorithm is applied to the computation problem of the Riemannian centroid on the symmetric positive definite (SPD) manifold as well as the principal component analysis and low-rank matrix completion problems on the Grassmann manifold. The results show that the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm in each case.
Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. Typically, the gradient of any such loss function fails to be dissipative making the use of widely-accepted (stochastic) gradient descent methods problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithms convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.
The variance reduction class of algorithms including the representative ones, SVRG and SARAH, have well documented merits for empirical risk minimization problems. However, they require grid search to tune parameters (step size and the number of iter ations per inner loop) for optimal performance. This work introduces `almost tune-free SVRG and SARAH schemes equipped with i) Barzilai-Borwein (BB) step sizes; ii) averaging; and, iii) the inner loop length adjusted to the BB step sizes. In particular, SVRG, SARAH, and their BB variants are first reexamined through an `estimate sequence lens to enable new averaging methods that tighten their convergence rates theoretically, and improve their performance empirically when the step size or the inner loop length is chosen large. Then a simple yet effective means to adjust the number of iterations per inner loop is developed to enhance the merits of the proposed averaging schemes and BB step sizes. Numerical tests corroborate the proposed methods.
In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e. g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا