Do you want to publish a course? Click here

Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions

87   0   0.0 ( 0 )
 Added by Wenbo Gao
 Publication date 2016
  fields
and research's language is English




Ask ChatGPT about the research

We consider the use of a curvature-adaptive step size in gradient-based iterative methods, including quasi-Newton methods, for minimizing self-concordant functions, extending an approach first proposed for Newtons method by Nesterov. This step size has a simple expression that can be computed analytically; hence, line searches are not needed. We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant functions. We present numerical experiments comparing gradient descent and BFGS methods using the curvature-adaptive step size to traditional methods on deterministic logistic regression problems, and t



rate research

Read More

In this paper, we follow the work (Rodomanov and Nesterov 2021) to study quasi-Newton methods, which is based on the updating formulas from a certain subclass of the Broyden family. We focus on the common SR1 and BFGS quasi-Newton methods to establish better explicit superlinear convergence. First, based on greedy quasi-Newton update in Rodomanov and Nesterovs work, which greedily selected the direction so as to maximize a certain measure of progress, we improve the linear convergence rate to a condition-number-free superlinear convergence rate, when applied with the well-known SR1 update, and BFGS update. Moreover, our results can also be applied to the inverse approximation of the SR1 update. Second, based on random update, that selects the direction randomly from any spherical symmetry distribution we show the same superlinear convergence rate established as above. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth and strongly self-concordant functions.
We provide several algorithms for constrained optimization of a large class of convex problems, including softmax, $ell_p$ regression, and logistic regression. Central to our approach is the notion of width reduction, a technique which has proven immensely useful in the context of maximum flow [Christiano et al., STOC11] and, more recently, $ell_p$ regression [Adil et al., SODA19], in terms of improving the iteration complexity from $O(m^{1/2})$ to $tilde{O}(m^{1/3})$, where $m$ is the number of rows of the design matrix, and where each iteration amounts to a linear system solve. However, a considerable drawback is that these methods require both problem-specific potentials and individually tailored analyses. As our main contribution, we initiate a new direction of study by presenting the first unified approach to achieving $m^{1/3}$-type rates. Notably, our method goes beyond these previously considered problems to more broadly capture quasi-self-concordant losses, a class which has recently generated much interest and includes the well-studied problem of logistic regression, among others. In order to do so, we develop a unified width reduction method for carefully handling these losses based on a more general set of potentials. Additionally, we directly achieve $m^{1/3}$-type rates in the constrained setting without the need for any explicit acceleration schemes, thus naturally complementing recent work based on a ball-oracle approach [Carmon et al., NeurIPS20].
This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval $h$ based on the noise estimation techniques of Hamming (2012) and More and Wild (2011). This noise estimation procedure and the selection of $h$ are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a function interpolating trust region method are presented.
115 - Jian-Wen Peng , Jie Ren 2021
In this paper, we propose some new proximal quasi-Newton methods with line search or without line search for a special class of nonsmooth multiobjective optimization problems, where each objective function is the sum of a twice continuously differentiable strongly convex function and a proper convex but not necessarily differentiable function. In these new proximal quasi-Newton methods, we approximate the Hessian matrices by using the well known BFGS, self-scaling BFGS, and the Huang BFGS method. We show that each accumulation point of the sequence generated by these new algorithms is a Pareto stationary point of the multiobjective optimization problem. In addition, we give their applications in robust multiobjective optimization, and we show that the subproblems of proximal quasi-Newton algorithms can be regarded as quadratic programming problems. Numerical experiments are carried out to verify the effectiveness of the proposed method.
We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past iterate information that could be significantly stale. Our proposed algorithms are efficient in terms of accessed data points (epochs) and have enough concurrency to take advantage of parallel/distributed computing environments. We provide convergence guarantees for our proposed methods. Numerical tests on a toy classification problem as well as on popular benchmarking binary classification and neural network training tasks reveal that the methods outperform their classical variants.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا