ﻻ يوجد ملخص باللغة العربية
We consider the problem of minimizing a sum of $n$ functions over a convex parameter set $mathcal{C} subset mathbb{R}^p$ where $ngg pgg 1$. In this regime, algorithms which utilize sub-sampling techniques are known to be effective. In this paper, we use sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newtons method, yet has much smaller per-iteration cost. The proposed algorithm is robust in terms of starting point and step size, and enjoys a composite convergence rate, namely, quadratic convergence at start and linear convergence when the iterate is close to the minimizer. We develop its theoretical analysis which also allows us to select near-optimal algorithm parameters. Our theoretical results can be used to obtain convergence rates of previously proposed sub-sampling based algorithms as well. We demonstrate how our results apply to well-known machine learning problems. Lastly, we evaluate the performance of our algorithm on several datasets under various scenarios.
We consider the problem of finding the minimizer of a convex function $F: mathbb R^d rightarrow mathbb R$ of the form $F(w) := sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $ abla^2 f_i(w)$ is readily available. We consider the regime
The spectral bundle method proposed by Helmberg and Rendl is well established for solving large scale semidefinite programs (SDP) thanks to its low per iteration computational complexity and strong practical performance. In this paper, we revisit thi
Adaptive Bayesian quadrature (ABQ) is a powerful approach to numerical integration that empirically compares favorably with Monte Carlo integration on problems of medium dimensionality (where non-adaptive quadrature is not competitive). Its key ingre
We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected clas
Momentum methods such as Polyaks heavy ball (HB) method, Nesterovs accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to nois