Faster Explicit Superlinear Convergence for Greedy and Random Quasi-Newton Methods

121 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dachao Lin

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Dachao Lin - Haishan Ye - Zhihua Zhang

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we follow the work (Rodomanov and Nesterov 2021) to study quasi-Newton methods, which is based on the updating formulas from a certain subclass of the Broyden family. We focus on the common SR1 and BFGS quasi-Newton methods to establish better explicit superlinear convergence. First, based on greedy quasi-Newton update in Rodomanov and Nesterovs work, which greedily selected the direction so as to maximize a certain measure of progress, we improve the linear convergence rate to a condition-number-free superlinear convergence rate, when applied with the well-known SR1 update, and BFGS update. Moreover, our results can also be applied to the inverse approximation of the SR1 update. Second, based on random update, that selects the direction randomly from any spherical symmetry distribution we show the same superlinear convergence rate established as above. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth and strongly self-concordant functions.

قيم البحث

86 - Wenbo Gao , Donald Goldfarb 2016

We consider the use of a curvature-adaptive step size in gradient-based iterative methods, including quasi-Newton methods, for minimizing self-concordant functions, extending an approach first proposed for Newtons method by Nesterov. This step size h as a simple expression that can be computed analytically; hence, line searches are not needed. We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant functions. We present numerical experiments comparing gradient descent and BFGS methods using the curvature-adaptive step size to traditional methods on deterministic logistic regression problems, and t

التحسين والتحكم

Explicit Superlinear Convergence of Broydens Method in Nonlinear Equations

61 - Dachao Lin , Haishan Ye , Zhihua Zhang 2021

In this paper, we follow the recent works about the explicit superlinear convergence rate of quasi-Newton methods. We focus on classical Broydens methods for solving nonlinear equations and establish explicit (local) superlinear convergence if the in itial parameter and approximate Jacobian is close enough to the solution. Our results show two natural trade-offs. The first one is between the superlinear convergence rate and the radius of the neighborhood at initialization. The second one is the balance of the initial distance with the solution and its Jacobian. Moreover, our analysis covers two original Broydens methods: Broydens good and bad methods. We discover the difference between them in the scope of local convergence region and the condition number dependence.

التحسين والتحكم

Some New Proximal Quasi-Newton Methods for Multiobjective Optimization Problems

115 - Jian-Wen Peng , Jie Ren 2021

In this paper, we propose some new proximal quasi-Newton methods with line search or without line search for a special class of nonsmooth multiobjective optimization problems, where each objective function is the sum of a twice continuously different iable strongly convex function and a proper convex but not necessarily differentiable function. In these new proximal quasi-Newton methods, we approximate the Hessian matrices by using the well known BFGS, self-scaling BFGS, and the Huang BFGS method. We show that each accumulation point of the sequence generated by these new algorithms is a Pareto stationary point of the multiobjective optimization problem. In addition, we give their applications in robust multiobjective optimization, and we show that the subproblems of proximal quasi-Newton algorithms can be regarded as quadratic programming problems. Numerical experiments are carried out to verify the effectiveness of the proposed method.

التحسين والتحكم

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

91 - Albert S. Berahas , Majid Jahani , Peter Richtarik 2019

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inver se Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past iterate information that could be significantly stale. Our proposed algorithms are efficient in terms of accessed data points (epochs) and have enough concurrency to take advantage of parallel/distributed computing environments. We provide convergence guarantees for our proposed methods. Numerical tests on a toy classification problem as well as on popular benchmarking binary classification and neural network training tasks reveal that the methods outperform their classical variants.

التحسين والتحكم التعلم الآلي التعلم الالي

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

123 - Minghan Yang , Dong Xu , Hongyu Chen 2020

In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.

التحسين والتحكم التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Faster Explicit Superlinear Convergence for Greedy and Random Quasi-Newton Methods

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً