Do you want to publish a course? Click here

Faster Explicit Superlinear Convergence for Greedy and Random Quasi-Newton Methods

121   0   0.0 ( 0 )
 Added by Dachao Lin
 Publication date 2021
  fields
and research's language is English




Ask ChatGPT about the research

In this paper, we follow the work (Rodomanov and Nesterov 2021) to study quasi-Newton methods, which is based on the updating formulas from a certain subclass of the Broyden family. We focus on the common SR1 and BFGS quasi-Newton methods to establish better explicit superlinear convergence. First, based on greedy quasi-Newton update in Rodomanov and Nesterovs work, which greedily selected the direction so as to maximize a certain measure of progress, we improve the linear convergence rate to a condition-number-free superlinear convergence rate, when applied with the well-known SR1 update, and BFGS update. Moreover, our results can also be applied to the inverse approximation of the SR1 update. Second, based on random update, that selects the direction randomly from any spherical symmetry distribution we show the same superlinear convergence rate established as above. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth and strongly self-concordant functions.



rate research

Read More

We consider the use of a curvature-adaptive step size in gradient-based iterative methods, including quasi-Newton methods, for minimizing self-concordant functions, extending an approach first proposed for Newtons method by Nesterov. This step size has a simple expression that can be computed analytically; hence, line searches are not needed. We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant functions. We present numerical experiments comparing gradient descent and BFGS methods using the curvature-adaptive step size to traditional methods on deterministic logistic regression problems, and t
In this paper, we follow the recent works about the explicit superlinear convergence rate of quasi-Newton methods. We focus on classical Broydens methods for solving nonlinear equations and establish explicit (local) superlinear convergence if the initial parameter and approximate Jacobian is close enough to the solution. Our results show two natural trade-offs. The first one is between the superlinear convergence rate and the radius of the neighborhood at initialization. The second one is the balance of the initial distance with the solution and its Jacobian. Moreover, our analysis covers two original Broydens methods: Broydens good and bad methods. We discover the difference between them in the scope of local convergence region and the condition number dependence.
115 - Jian-Wen Peng , Jie Ren 2021
In this paper, we propose some new proximal quasi-Newton methods with line search or without line search for a special class of nonsmooth multiobjective optimization problems, where each objective function is the sum of a twice continuously differentiable strongly convex function and a proper convex but not necessarily differentiable function. In these new proximal quasi-Newton methods, we approximate the Hessian matrices by using the well known BFGS, self-scaling BFGS, and the Huang BFGS method. We show that each accumulation point of the sequence generated by these new algorithms is a Pareto stationary point of the multiobjective optimization problem. In addition, we give their applications in robust multiobjective optimization, and we show that the subproblems of proximal quasi-Newton algorithms can be regarded as quadratic programming problems. Numerical experiments are carried out to verify the effectiveness of the proposed method.
We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past iterate information that could be significantly stale. Our proposed algorithms are efficient in terms of accessed data points (epochs) and have enough concurrency to take advantage of parallel/distributed computing environments. We provide convergence guarantees for our proposed methods. Numerical tests on a toy classification problem as well as on popular benchmarking binary classification and neural network training tasks reveal that the methods outperform their classical variants.
In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا