بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Convergence rates and approximation results for SGD and its continuous-time counterpart

170 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xavier Fontaine

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Xavier Fontaine - Valentin De Bortoli -

التحسين والتحكم التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Steins method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous {L}ojasiewicz inequality, which can be applied to a class of non-convex functions.

قيم البحث

125 - Quentin Merigot , Filippo Santambrogio (ICJ 2021

Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ p oints which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyds algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.

التحسين والتحكم التعلم الالي

The Epsilon-Alternating Least Squares for Orthogonal Low-Rank Tensor Approximation and Its Global Convergence

81 - Yuning Yang 2019

The epsilon alternating least squares ($epsilon$-ALS) is developed and analyzed for canonical polyadic decomposition (approximation) of a higher-order tensor where one or more of the factor matrices are assumed to be columnwisely orthonormal. It is s hown that the algorithm globally converges to a KKT point for all tensors without any assumption. For the original ALS, by further studying the properties of the polar decomposition, we also establish its global convergence under a reality assumption not stronger than those in the literature. These results completely address a question concerning the global convergence raised in [L. Wang, M. T. Chu and B. Yu, emph{SIAM J. Matrix Anal. Appl.}, 36 (2015), pp. 1--19]. In addition, an initialization procedure is proposed, which possesses a provable lower bound when the number of columnwisely orthonormal factors is one. Armed with this initialization procedure, numerical experiments show that the $epsilon$-ALS exhibits a promising performance in terms of efficiency and effectiveness.

التحسين والتحكم التحليل العددي التحليل العددي

Randomized Methods for Linear Constraints: Convergence Rates and Conditioning

363 - D. Leventhal , A.S. Lewis 2008

We study randomized variants of two classical algorithms: coordinate descent for systems of linear equations and iterated projections for systems of linear inequalities. Expanding on a recent randomized iterated projection algorithm of Strohmer and V ershynin for systems of linear equations, we show that, under appropriate probability distributions, the linear rates of convergence (in expectation) can be bounded in terms of natural linear-algebraic condition numbers for the problems. We relate these condition measures to distances to ill-posedness, and discuss generalizations to convex systems under metric regularity assumptions.

التحسين والتحكم التحليل العددي

Improved convergence rates and trajectory convergence for primal-dual dynamical systems with vanishing damping

119 - Radu Ioan Bot , Dang-Khoa Nguyen 2021

In this work, we approach the minimization of a continuously differentiable convex function under linear equality constraints by a second-order dynamical system with asymptotically vanishing damping term. The system is formulated in terms of the augm ented Lagrangian associated to the minimization problem. We show fast convergence of the primal-dual gap, the feasibility measure, and the objective function value along the generated trajectories. In case the objective function has Lipschitz continuous gradient, we show that the primal-dual trajectory asymptotically weakly converges to a primal-dual optimal solution of the underlying minimization problem. To the best of our knowledge, this is the first result which guarantees the convergence of the trajectory generated by a primal-dual dynamical system with asymptotic vanishing damping. Moreover, we will rediscover in case of the unconstrained minimization of a convex differentiable function with Lipschitz continuous gradient all convergence statements obtained in the literature for Nesterovs accelerated gradient method.

التحسين والتحكم التحليل الكلاسيكي و ODEs

On the metric resolvent: nonexpansiveness, convergence rates and applications

124 - Feng Xue 2021

In this paper, we study the nonexpansive properties of metric resolvent, and present a convergence rate analysis for the associated fixed-point iterations (Banach-Picard and Krasnoselskii-Mann types). Equipped with a variable metric, we develop the g lobal ergodic and non-ergodic iteration-complexity bounds in terms of both solution distance and objective value. A byproduct of our expositions also extends the proximity operator and Moreaus decomposition identity to arbitrary variable metric. It is further shown that many classes of the first-order operator splitting algorithms, including alternating direction methods of multipliers, primal-dual hybrid gradient and Bregman iterations, can be expressed by the fixed-point iterations of a simple metric resolvent, and thus, the convergence can be analyzed within this unified framework.

التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

معھد الشام العالي للعلوم الشرعية واللغة العربية والدراسات والبحوث الإسلامية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Convergence rates and approximation results for SGD and its continuous-time counterpart

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً