Do you want to publish a course? Click here

Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets

170   0   0.0 ( 0 )
 Added by Damien Scieur
 Publication date 2020
  fields
and research's language is English




Ask ChatGPT about the research

It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FWs affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine invariant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we propose an affine invariant backtracking line-search. Interestingly, we show that typical backtracking line-searches using smoothness of the objective function surprisingly converge to an affine invariant step size, despite using affine-dependent norms in the step sizes computation. This indicates that we do not necessarily need to know the sets structure in advance to enjoy the affine-invariant accelerated rate.



rate research

Read More

Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.
123 - Baojian Zhou , Yifan Sun 2021
In this paper, we propose approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the textit{linear minimization oracle} (LMO) cannot be efficiently obtained in general. We first demonstrate that two popular approximation assumptions (textit{additive} and textit{multiplicative gap errors)}, are not valid for our problem, in that no cheap gap-approximate LMO oracle exists in general. Instead, a new textit{approximate dual maximization oracle} (DMO) is proposed, which approximates the inner product rather than the gap. When the objective is $L$-smooth, we prove that the standard FW method using a $delta$-approximate DMO converges as $mathcal{O}(L / delta t + (1-delta)(delta^{-1} + delta^{-2}))$ in general, and as $mathcal{O}(L/(delta^2(t+2)))$ over a $delta$-relaxation of the constraint set. Additionally, when the objective is $mu$-strongly convex and the solution is unique, a variant of FW converges to $mathcal{O}(L^2log(t)/(mu delta^6 t^2))$ with the same per-iteration complexity. Our empirical results suggest that even these improved bounds are pessimistic, with significant improvement in recovering real-world images with graph-structured sparsity.
We present and analyze a new generalized Frank-Wolfe method for the composite optimization problem $(P):{min}_{xinmathbb{R}^n}; f(mathsf{A} x) + h(x)$, where $f$ is a $theta$-logarithmically-homogeneous self-concordant barrier, $mathsf{A}$ is a linear operator and the function $h$ has bounded domain but is possibly non-smooth. We show that our generalized Frank-Wolfe method requires $O((delta_0 + theta + R_h)ln(delta_0) + (theta + R_h)^2/varepsilon)$ iterations to produce an $varepsilon$-approximate solution, where $delta_0$ denotes the initial optimality gap and $R_h$ is the variation of $h$ on its domain. This result establishes certain intrinsic connections between $theta$-logarithmically homogeneous barriers and the Frank-Wolfe method. When specialized to the $D$-optimal design problem, we essentially recover the complexity obtained by Khachiyan using the Frank-Wolfe method with exact line-search. We also study the (Fenchel) dual problem of $(P)$, and we show that our new method is equivalent to an adaptive-step-size mirror descent method applied to the dual problem. This enables us to provide iteration complexity bounds for the mirror descent method despite even though the dual objective function is non-Lipschitz and has unbounded domain. In addition, we present computational experiments that point to the potential usefulness of our generalized Frank-Wolfe method on Poisson image de-blurring problems with TV regularization, and on simulated PET problem instances.
We study constrained stochastic programs where the decision vector at each time slot cannot be chosen freely but is tied to the realization of an underlying random state vector. The goal is to minimize a general objective function subject to linear constraints. A typical scenario where such programs appear is opportunistic scheduling over a network of time-varying channels, where the random state vector is the channel state observed, and the control vector is the transmission decision which depends on the current channel state. We consider a primal-dual type Frank-Wolfe algorithm that has a low complexity update during each slot and that learns to make efficient decisions without prior knowledge of the probability distribution of the random state vector. We establish convergence time guarantees for the case of both convex and non-convex objective functions. We also emphasize application of the algorithm to non-convex opportunistic scheduling and distributed non-convex stochastic optimization over a connected graph.
The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart. In this work, we present a new generalized stochastic Frank-Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of ``linear prediction property (formally defined in the paper), which is typically present loss minimization problems in practice. Our method also introduces the notion of a ``substitute gradient that is a not-necessarily-unbiased sample of the gradient. We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space. Also, in the special case of a strongly convex regularizer our generalized stochastic Frank-Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence. Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank-Wolfe methods consistent with the theory developed herein.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا