ﻻ يوجد ملخص باللغة العربية
The Frank-Wolfe algorithm has regained much interest in its use in structurally constrained machine learning applications. However, one major limitation of the Frank-Wolfe algorithm is the slow local convergence property due to the zig-zagging behavior. We observe that this zig-zagging phenomenon can be viewed as an artifact of discretization, as when the method is viewed as an Euler discretization of a continuous time flow, that flow does not zig-zag. For this reason, we propose multistep Frank-Wolfe variants based on discretizations of the same flow whose truncation errors decay as $O(Delta^p)$, where $p$ is the methods order. We observe speedups using these variants, but at a cost of extra gradient calls per iteration. However, because the multistep methods present better search directions, we show that they are better able to leverage line search and momentum speedups.
We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM). On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms. The encoura
We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a
Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper
We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse,
We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables