ترغب بنشر مسار تعليمي؟ اضغط هنا

Convergence rates and approximation results for SGD and its continuous-time counterpart

170   0   0.0 ( 0 )
 نشر من قبل Xavier Fontaine
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Steins method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous {L}ojasiewicz inequality, which can be applied to a class of non-convex functions.



قيم البحث

اقرأ أيضاً

Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ p oints which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyds algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.
81 - Yuning Yang 2019
The epsilon alternating least squares ($epsilon$-ALS) is developed and analyzed for canonical polyadic decomposition (approximation) of a higher-order tensor where one or more of the factor matrices are assumed to be columnwisely orthonormal. It is s hown that the algorithm globally converges to a KKT point for all tensors without any assumption. For the original ALS, by further studying the properties of the polar decomposition, we also establish its global convergence under a reality assumption not stronger than those in the literature. These results completely address a question concerning the global convergence raised in [L. Wang, M. T. Chu and B. Yu, emph{SIAM J. Matrix Anal. Appl.}, 36 (2015), pp. 1--19]. In addition, an initialization procedure is proposed, which possesses a provable lower bound when the number of columnwisely orthonormal factors is one. Armed with this initialization procedure, numerical experiments show that the $epsilon$-ALS exhibits a promising performance in terms of efficiency and effectiveness.
117 - D. Leventhal , A.S. Lewis 2008
We study randomized variants of two classical algorithms: coordinate descent for systems of linear equations and iterated projections for systems of linear inequalities. Expanding on a recent randomized iterated projection algorithm of Strohmer and V ershynin for systems of linear equations, we show that, under appropriate probability distributions, the linear rates of convergence (in expectation) can be bounded in terms of natural linear-algebraic condition numbers for the problems. We relate these condition measures to distances to ill-posedness, and discuss generalizations to convex systems under metric regularity assumptions.
In this work, we approach the minimization of a continuously differentiable convex function under linear equality constraints by a second-order dynamical system with asymptotically vanishing damping term. The system is formulated in terms of the augm ented Lagrangian associated to the minimization problem. We show fast convergence of the primal-dual gap, the feasibility measure, and the objective function value along the generated trajectories. In case the objective function has Lipschitz continuous gradient, we show that the primal-dual trajectory asymptotically weakly converges to a primal-dual optimal solution of the underlying minimization problem. To the best of our knowledge, this is the first result which guarantees the convergence of the trajectory generated by a primal-dual dynamical system with asymptotic vanishing damping. Moreover, we will rediscover in case of the unconstrained minimization of a convex differentiable function with Lipschitz continuous gradient all convergence statements obtained in the literature for Nesterovs accelerated gradient method.
124 - Feng Xue 2021
In this paper, we study the nonexpansive properties of metric resolvent, and present a convergence rate analysis for the associated fixed-point iterations (Banach-Picard and Krasnoselskii-Mann types). Equipped with a variable metric, we develop the g lobal ergodic and non-ergodic iteration-complexity bounds in terms of both solution distance and objective value. A byproduct of our expositions also extends the proximity operator and Moreaus decomposition identity to arbitrary variable metric. It is further shown that many classes of the first-order operator splitting algorithms, including alternating direction methods of multipliers, primal-dual hybrid gradient and Bregman iterations, can be expressed by the fixed-point iterations of a simple metric resolvent, and thus, the convergence can be analyzed within this unified framework.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا