Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Albert Berahas

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Albert S. Berahas - Majid Jahani - Peter Richtarik

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past iterate information that could be significantly stale. Our proposed algorithms are efficient in terms of accessed data points (epochs) and have enough concurrency to take advantage of parallel/distributed computing environments. We provide convergence guarantees for our proposed methods. Numerical tests on a toy classification problem as well as on popular benchmarking binary classification and neural network training tasks reveal that the methods outperform their classical variants.

قيم البحث

115 - Jian-Wen Peng , Jie Ren 2021

In this paper, we propose some new proximal quasi-Newton methods with line search or without line search for a special class of nonsmooth multiobjective optimization problems, where each objective function is the sum of a twice continuously different iable strongly convex function and a proper convex but not necessarily differentiable function. In these new proximal quasi-Newton methods, we approximate the Hessian matrices by using the well known BFGS, self-scaling BFGS, and the Huang BFGS method. We show that each accumulation point of the sequence generated by these new algorithms is a Pareto stationary point of the multiobjective optimization problem. In addition, we give their applications in robust multiobjective optimization, and we show that the subproblems of proximal quasi-Newton algorithms can be regarded as quadratic programming problems. Numerical experiments are carried out to verify the effectiveness of the proposed method.

التحسين والتحكم

Faster Explicit Superlinear Convergence for Greedy and Random Quasi-Newton Methods

120 - Dachao Lin , Haishan Ye , Zhihua Zhang 2021

In this paper, we follow the work (Rodomanov and Nesterov 2021) to study quasi-Newton methods, which is based on the updating formulas from a certain subclass of the Broyden family. We focus on the common SR1 and BFGS quasi-Newton methods to establis h better explicit superlinear convergence. First, based on greedy quasi-Newton update in Rodomanov and Nesterovs work, which greedily selected the direction so as to maximize a certain measure of progress, we improve the linear convergence rate to a condition-number-free superlinear convergence rate, when applied with the well-known SR1 update, and BFGS update. Moreover, our results can also be applied to the inverse approximation of the SR1 update. Second, based on random update, that selects the direction randomly from any spherical symmetry distribution we show the same superlinear convergence rate established as above. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth and strongly self-concordant functions.

التحسين والتحكم

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

123 - Minghan Yang , Dong Xu , Hongyu Chen 2020

In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.

التحسين والتحكم التعلم الالي

A Progressive Batching L-BFGS Method for Machine Learning

85 - Raghu Bollapragada , Dheevatsa Mudigere , Jorge Nocedal 2018

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective fun ction. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

التحسين والتحكم التعلم الآلي التعلم الالي

Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions

86 - Wenbo Gao , Donald Goldfarb 2016

We consider the use of a curvature-adaptive step size in gradient-based iterative methods, including quasi-Newton methods, for minimizing self-concordant functions, extending an approach first proposed for Newtons method by Nesterov. This step size h as a simple expression that can be computed analytically; hence, line searches are not needed. We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant functions. We present numerical experiments comparing gradient descent and BFGS methods using the curvature-adaptive step size to traditional methods on deterministic logistic regression problems, and t

التحسين والتحكم