Do you want to publish a course? Click here

Analysis of the Frank-Wolfe Method for Convex Composite Optimization involving a Logarithmically-Homogeneous Barrier

151   0   0.0 ( 0 )
 Added by Renbo Zhao
 Publication date 2020
  fields
and research's language is English




Ask ChatGPT about the research

We present and analyze a new generalized Frank-Wolfe method for the composite optimization problem $(P):{min}_{xinmathbb{R}^n}; f(mathsf{A} x) + h(x)$, where $f$ is a $theta$-logarithmically-homogeneous self-concordant barrier, $mathsf{A}$ is a linear operator and the function $h$ has bounded domain but is possibly non-smooth. We show that our generalized Frank-Wolfe method requires $O((delta_0 + theta + R_h)ln(delta_0) + (theta + R_h)^2/varepsilon)$ iterations to produce an $varepsilon$-approximate solution, where $delta_0$ denotes the initial optimality gap and $R_h$ is the variation of $h$ on its domain. This result establishes certain intrinsic connections between $theta$-logarithmically homogeneous barriers and the Frank-Wolfe method. When specialized to the $D$-optimal design problem, we essentially recover the complexity obtained by Khachiyan using the Frank-Wolfe method with exact line-search. We also study the (Fenchel) dual problem of $(P)$, and we show that our new method is equivalent to an adaptive-step-size mirror descent method applied to the dual problem. This enables us to provide iteration complexity bounds for the mirror descent method despite even though the dual objective function is non-Lipschitz and has unbounded domain. In addition, we present computational experiments that point to the potential usefulness of our generalized Frank-Wolfe method on Poisson image de-blurring problems with TV regularization, and on simulated PET problem instances.



rate research

Read More

It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FWs affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine invariant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we propose an affine invariant backtracking line-search. Interestingly, we show that typical backtracking line-searches using smoothness of the objective function surprisingly converge to an affine invariant step size, despite using affine-dependent norms in the step sizes computation. This indicates that we do not necessarily need to know the sets structure in advance to enjoy the affine-invariant accelerated rate.
The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart. In this work, we present a new generalized stochastic Frank-Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of ``linear prediction property (formally defined in the paper), which is typically present loss minimization problems in practice. Our method also introduces the notion of a ``substitute gradient that is a not-necessarily-unbiased sample of the gradient. We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space. Also, in the special case of a strongly convex regularizer our generalized stochastic Frank-Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence. Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank-Wolfe methods consistent with the theory developed herein.
167 - Melanie Weber , Suvrit Sra 2017
We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian linear oracle required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian linear oracle can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.
We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate $mathcal{O}(frac{1}{k})$.
The Frank-Wolfe method and its extensions are well-suited for delivering solutions with desirable structural properties, such as sparsity or low-rank structure. We introduce a new variant of the Frank-Wolfe method that combines Frank-Wolfe steps and steepest descent steps, as well as a novel modification of the Frank-Wolfe gap to measure convergence in the non-convex case. We further extend this method to incorporate in-face directions for preserving structured solutions as well as block coordinate steps, and we demonstrate computational guarantees in terms of the modified Frank-Wolfe gap for all of these variants. We are particularly motivated by the application of this methodology to the training of neural networks with sparse properties, and we apply our block coordinate method to the problem of $ell_1$ regularized neural network training. We present the results of several numerical experiments on both artificial and real datasets demonstrating significant improvements of our method in training sparse neural networks.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا