On a Faster $R$-Linear Convergence Rate of the Barzilai-Borwein Method

81 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dawei Li

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dawei Li - Ruoyu Sun

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Barzilai-Borwein (BB) method has demonstrated great empirical success in nonlinear optimization. However, the convergence speed of BB method is not well understood, as the known convergence rate of BB method for quadratic problems is much worse than the steepest descent (SD) method. Therefore, there is a large discrepancy between theory and practice. To shrink this gap, we prove that the BB method converges $R$-linearly at a rate of $1-1/kappa$, where $kappa$ is the condition number, for strongly convex quadratic problems. In addition, an example with the theoretical rate of convergence is constructed, indicating the tightness of our bound.

قيم البحث

100 - Sajad Fathi Hafshejani , Daya Gaur , Shahadat Hossain 2021

In this paper, we propose a new non-monotone conjugate gradient method for solving unconstrained nonlinear optimization problems. We first modify the non-monotone line search method by introducing a new trigonometric function to calculate the non-mon otone parameter, which plays an essential role in the algorithms efficiency. Then, we apply a convex combination of the Barzilai-Borwein method for calculating the value of step size in each iteration. Under some suitable assumptions, we prove that the new algorithm has the global convergence property. The efficiency and effectiveness of the proposed method are determined in practice by applying the algorithm to some standard test problems and non-negative matrix factorization problems.

التحسين والتحكم التعلم الآلي

On Faster Convergence of Scaled Sign Gradient Descent

173 - Xiuxian Li , Kuo-Yi Lin , Li Li 2021

Communication has been seen as a significant bottleneck in industrial applications over large-scale networks. To alleviate the communication burden, sign-based optimization algorithms have gained popularity recently in both industrial and academic co mmunities, which is shown to be closely related to adaptive gradient methods, such as Adam. Along this line, this paper investigates faster convergence for a variant of sign-based gradient descent, called scaled signGD, in three cases: 1) the objective function is strongly convex; 2) the objective function is nonconvex but satisfies the Polyak-Lojasiewicz (PL) inequality; 3) the gradient is stochastic, called scaled signGD in this case. For the first two cases, it can be shown that the scaled signGD converges at a linear rate. For case 3), the algorithm is shown to converge linearly to a neighborhood of the optimal value when a constant learning rate is employed, and the algorithm converges at a rate of $O(1/k)$ when using a diminishing learning rate, where $k$ is the iteration number. The results are also extended to the distributed setting by majority vote in a parameter-server framework. Finally, numerical experiments on logistic regression are performed to corroborate the theoretical findings.

التحسين والتحكم التعلم الآلي

Distributed SAGA: Maintaining linear convergence rate with limited communication

123 - Clement Calauz`enes , Nicolas Le Roux 2017

In recent years, variance-reducing stochastic methods have shown great practical performance, exhibiting linear convergence rate when other stochastic methods offered a sub-linear rate. However, as datasets grow ever bigger and clusters become widesp read, the need for fast distribution methods is pressing. We propose here a distribution scheme for SAGA which maintains a linear convergence rate, even when communication between nodes is limited.

التحسين والتحكم التعلم الآلي

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

330 - Nicolas Le Roux , Mark Schmidtn (INRIA Paris - Rocquencourt 2012

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method inc orporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

التحسين والتحكم التعلم الآلي

Linear Rate Convergence of the Alternating Direction Method of Multipliers for Convex Composite Quadratic and Semi-Definite Programming

82 - Deren Han , Defeng Sun , Liwei Zhang 2015

In this paper, we aim to provide a comprehensive analysis on the linear rate convergence of the alternating direction method of multipliers (ADMM) for solving linearly constrained convex composite optimization problems. Under a certain error bound co ndition, we establish the global linear rate of convergence for a more general semi-proximal ADMM with the dual steplength being restricted to be in the open interval $(0, (1+sqrt{5})/2)$. In our analysis, we assume neither the strong convexity nor the strict complementarity except an error bound condition, which holds automatically for convex composite quadratic programming. This semi-proximal ADMM, which includes the classic ADMM, not only has the advantage to resolve the potentially non-solvability issue of the subproblems in the classic ADMM but also possesses the abilities of handling multi-block convex optimization problems efficiently. We shall use convex composite quadratic programming and quadratic semi-definite programming as important applications to demonstrate the significance of the obtained results. Of its own novelty in second-order variational analysis, a complete characterization is provided on the isolated calmness for the nonlinear convex semi-definite optimization problem in terms of its second order sufficient optimality condition and the strict Robinson constraint qualification for the purpose of proving the linear rate convergence of the semi-proximal ADMM when applied to two- and multi-block convex quadratic semi-definite programming.

التحسين والتحكم