Stochastic Gradient Langevin Dynamics with Variance Reduction

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhishen Huang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhishen Huang - Stephen Becker

التعلم الآلي التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Stochastic gradient Langevin dynamics (SGLD) has gained the attention of optimization researchers due to its global optimization properties. This paper proves an improved convergence property to local minimizers of nonconvex objective functions using SGLD accelerated by variance reductions. Moreover, we prove an ergodicity property of the SGLD scheme, which gives insights on its potential to find global minimizers of nonconvex objectives.

قيم البحث

66 - Chao Zhang , Zebang Shen , Hui Qian 2016

Alternating Direction Method of Multipliers (ADMM) is a popular method in solving Machine Learning problems. Stochastic ADMM was firstly proposed in order to reduce the per iteration computational complexity, which is more suitable for big data probl ems. Recently, variance reduction techniques have been integrated with stochastic ADMM in order to get a fast convergence rate, such as SAG-ADMM and SVRG-ADMM,but the convergence is still suboptimal w.r.t the smoothness constant. In this paper, we propose a new accelerated stochastic ADMM algorithm with variance reduction, which enjoys a faster convergence than all the other stochastic ADMM algorithms. We theoretically analyze its convergence rate and show its dependence on the smoothness constant is optimal. We also empirically validate its effectiveness and show its priority over other stochastic ADMM algorithms.

التحليل العددي التحسين والتحكم

Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport

127 - Hiroyuki Sato , Hiroyuki Kasai , Bamdev Mishra 2017

In recent years, stochastic variance reduction algorithms have attracted considerable attention for minimizing the average of a large but finite number of loss functions. This paper proposes a novel Riemannian extension of the Euclidean stochastic va riance reduced gradient (R-SVRG) algorithm to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are addressed with retraction and vector transport. For the proposed algorithm, we present a global convergence analysis with a decaying step size as well as a local convergence rate analysis with a fixed step size under some natural assumptions. In addition, the proposed algorithm is applied to the computation problem of the Riemannian centroid on the symmetric positive definite (SPD) manifold as well as the principal component analysis and low-rank matrix completion problems on the Grassmann manifold. The results show that the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm in each case.

التعلم الآلي التحسين والتحكم التعلم الالي

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

109 - Attila Lovas , Iosif Lytras , Miklos Rasonyi 2020

Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. Typically, the gradient of any such loss function fails to be dissipative making the use of widely-accepted (stochastic) gradient descent methods problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithms convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.

التعلم الآلي التحسين والتحكم الاحتمالات

Almost Tune-Free Variance Reduction

171 - Bingcong Li , Lingda Wang , Georgios B. Giannakis 2019

The variance reduction class of algorithms including the representative ones, SVRG and SARAH, have well documented merits for empirical risk minimization problems. However, they require grid search to tune parameters (step size and the number of iter ations per inner loop) for optimal performance. This work introduces `almost tune-free SVRG and SARAH schemes equipped with i) Barzilai-Borwein (BB) step sizes; ii) averaging; and, iii) the inner loop length adjusted to the BB step sizes. In particular, SVRG, SARAH, and their BB variants are first reexamined through an `estimate sequence lens to enable new averaging methods that tighten their convergence rates theoretically, and improve their performance empirically when the step size or the inner loop length is chosen large. Then a simple yet effective means to adjust the number of iterations per inner loop is developed to enhance the merits of the proposed averaging schemes and BB step sizes. Numerical tests corroborate the proposed methods.

التعلم الآلي التحسين والتحكم التعلم الالي

Aggregated Gradient Langevin Dynamics

57 - Chao Zhang , Jiahao Xie , Zebang Shen 2019

In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e. g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.

التعلم الآلي التعلم الالي