Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Global Convergence of Frank Wolfe on One Hidden Layer Networks

75 0 0.0 ( 0 )

Download Cite

Added by Alexandre d'Aspremont

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Alexandre dAspremont - Mert Pilanci

Optimization and Control Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then converges with rate $O(1/T)$ where $T$ is both the number of neurons and the number of calls to the oracle.

rate research

Spectral Frank-Wolfe Algorithm: Strict Complementarity and Linear Convergence

142 - Lijun Ding , Yingjie Fei , Qiantong Xu 2020

We develop a novel variant of the classical Frank-Wolfe algorithm, which we call spectral Frank-Wolfe, for convex optimization over a spectrahedron. The spectral Frank-Wolfe algorithm has a novel ingredient: it computes a few eigenvectors of the gradient and solves a small-scale SDP in each iteration. Such procedure overcomes slow convergence of the classical Frank-Wolfe algorithm due to ignoring eigenvalue coalescence. We demonstrate that strict complementarity of the optimization problem is key to proving linear convergence of various algorithms, such as the spectral Frank-Wolfe algorithm as well as the projected gradient method and its accelerated version.

Optimization and Control

Riemannian Optimization via Frank-Wolfe Methods

167 - Melanie Weber , Suvrit Sra 2017

We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian linear oracle required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian linear oracle can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.

Optimization and Control Machine Learning Functional Analysis

How Does Momentum Help Frank Wolfe?

141 - Bingcong Li , Mario Coutino , Georgios B. Giannakis 2020

We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM). On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms. The encouraging message behind this link, on the other hand, is that momentum is useful for FW on a class of problems. In particular, we prove that a momentum variant of FW, that we term accelerated Frank Wolfe (AFW), converges with a faster rate $tilde{cal O}(frac{1}{k^2})$ on certain constraint sets despite the same ${cal O}(frac{1}{k})$ rate as FW on general cases. Given the possible acceleration of AFW at almost no extra cost, it is thus a competitive alternative to FW. Numerical experiments on benchmarked machine learning tasks further validate our theoretical findings.

Optimization and Control Machine Learning Machine Learning

Self-Concordant Analysis of Frank-Wolfe Algorithms

203 - Pavel Dvurechensky , Petr Ostroukhov , Kamil Safin 2020

Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.

Optimization and Control Machine Learning Computation

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

199 - Geoffrey Negiar , Gideon Dresdner , Alicia Tsai 2020

We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.

Optimization and Control Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Global Convergence of Frank Wolfe on One Hidden Layer Networks

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions