Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Baojian Zhou

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Baojian Zhou - Yifan Sun

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we propose approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the textit{linear minimization oracle} (LMO) cannot be efficiently obtained in general. We first demonstrate that two popular approximation assumptions (textit{additive} and textit{multiplicative gap errors)}, are not valid for our problem, in that no cheap gap-approximate LMO oracle exists in general. Instead, a new textit{approximate dual maximization oracle} (DMO) is proposed, which approximates the inner product rather than the gap. When the objective is $L$-smooth, we prove that the standard FW method using a $delta$-approximate DMO converges as $mathcal{O}(L / delta t + (1-delta)(delta^{-1} + delta^{-2}))$ in general, and as $mathcal{O}(L/(delta^2(t+2)))$ over a $delta$-relaxation of the constraint set. Additionally, when the objective is $mu$-strongly convex and the solution is unique, a variant of FW converges to $mathcal{O}(L^2log(t)/(mu delta^6 t^2))$ with the same per-iteration complexity. Our empirical results suggest that even these improved bounds are pessimistic, with significant improvement in recovering real-world images with graph-structured sparsity.

قيم البحث

203 - Pavel Dvurechensky , Petr Ostroukhov , Kamil Safin 2020

Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.

التحسين والتحكم التعلم الآلي حساب

Frank-Wolfe Style Algorithms for Large Scale Optimization

121 - Lijun Ding , Madeleine Udell 2018

We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate $mathcal{O}(frac{1}{k})$.

التحسين والتحكم التعلم الالي

Riemannian Optimization via Frank-Wolfe Methods

167 - Melanie Weber , Suvrit Sra 2017

We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian linear oracle required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian linear oracle can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.

التحسين والتحكم التعلم الآلي تحليل وظيفي

How Does Momentum Help Frank Wolfe?

141 - Bingcong Li , Mario Coutino , Georgios B. Giannakis 2020

We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM). On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms. The encoura ging message behind this link, on the other hand, is that momentum is useful for FW on a class of problems. In particular, we prove that a momentum variant of FW, that we term accelerated Frank Wolfe (AFW), converges with a faster rate $tilde{cal O}(frac{1}{k^2})$ on certain constraint sets despite the same ${cal O}(frac{1}{k})$ rate as FW on general cases. Given the possible acceleration of AFW at almost no extra cost, it is thus a competitive alternative to FW. Numerical experiments on benchmarked machine learning tasks further validate our theoretical findings.

التحسين والتحكم التعلم الآلي التعلم الالي

Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets

169 - Thomas Kerdreux , Lewis Liu , Simon Lacoste-Julien 2020

It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invar iant bounds, in contradiction with FWs affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine invariant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we propose an affine invariant backtracking line-search. Interestingly, we show that typical backtracking line-searches using smoothness of the objective function surprisingly converge to an affine invariant step size, despite using affine-dependent norms in the step sizes computation. This indicates that we do not necessarily need to know the sets structure in advance to enjoy the affine-invariant accelerated rate.

التحسين والتحكم