Analysis of the Frank-Wolfe Method for Convex Composite Optimization involving a Logarithmically-Homogeneous Barrier

151 0 0.0 ( 0 )

Download Cite

Added by Renbo Zhao

Publication date 2020

fields

and research's language is English

Authors Renbo Zhao - Robert M. Freund

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present and analyze a new generalized Frank-Wolfe method for the composite optimization problem $(P):{min}_{xinmathbb{R}^n}; f(mathsf{A} x) + h(x)$, where $f$ is a $theta$-logarithmically-homogeneous self-concordant barrier, $mathsf{A}$ is a linear operator and the function $h$ has bounded domain but is possibly non-smooth. We show that our generalized Frank-Wolfe method requires $O((delta_0 + theta + R_h)ln(delta_0) + (theta + R_h)^2/varepsilon)$ iterations to produce an $varepsilon$-approximate solution, where $delta_0$ denotes the initial optimality gap and $R_h$ is the variation of $h$ on its domain. This result establishes certain intrinsic connections between $theta$-logarithmically homogeneous barriers and the Frank-Wolfe method. When specialized to the $D$-optimal design problem, we essentially recover the complexity obtained by Khachiyan using the Frank-Wolfe method with exact line-search. We also study the (Fenchel) dual problem of $(P)$, and we show that our new method is equivalent to an adaptive-step-size mirror descent method applied to the dual problem. This enables us to provide iteration complexity bounds for the mirror descent method despite even though the dual objective function is non-Lipschitz and has unbounded domain. In addition, we present computational experiments that point to the potential usefulness of our generalized Frank-Wolfe method on Poisson image de-blurring problems with TV regularization, and on simulated PET problem instances.

rate research

Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets

169 - Thomas Kerdreux , Lewis Liu , Simon Lacoste-Julien 2020

It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FWs affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine invariant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we propose an affine invariant backtracking line-search. Interestingly, we show that typical backtracking line-searches using smoothness of the objective function surprisingly converge to an affine invariant step size, despite using affine-dependent norms in the step sizes computation. This indicates that we do not necessarily need to know the sets structure in advance to enjoy the affine-invariant accelerated rate.

Optimization and Control

Generalized Stochastic Frank-Wolfe Algorithm with Stochastic Substitute Gradient for Structured Convex Optimization

75 - Haihao Lu , Robert M. Freund 2018

The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart. In this work, we present a new generalized stochastic Frank-Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of ``linear prediction property (formally defined in the paper), which is typically present loss minimization problems in practice. Our method also introduces the notion of a ``substitute gradient that is a not-necessarily-unbiased sample of the gradient. We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space. Also, in the special case of a strongly convex regularizer our generalized stochastic Frank-Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence. Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank-Wolfe methods consistent with the theory developed herein.

Optimization and Control

Riemannian Optimization via Frank-Wolfe Methods

167 - Melanie Weber , Suvrit Sra 2017

We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian linear oracle required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian linear oracle can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.

Optimization and Control Machine Learning Functional Analysis

Frank-Wolfe Style Algorithms for Large Scale Optimization

121 - Lijun Ding , Madeleine Udell 2018

We introduce a few variants on Frank-Wolfe style algorithms suitable for large scale optimization. We show how to modify the standard Frank-Wolfe algorithm using stochastic gradients, approximate subproblem solutions, and sketched decision variables in order to scale to enormous problems while preserving (up to constants) the optimal convergence rate $mathcal{O}(frac{1}{k})$.

Optimization and Control Machine Learning

Stochastic In-Face Frank-Wolfe Methods for Non-Convex Optimization and Sparse Neural Network Training

72 - Paul Grigas , Alfonso Lobos , Nathan Vermeersch 2019

The Frank-Wolfe method and its extensions are well-suited for delivering solutions with desirable structural properties, such as sparsity or low-rank structure. We introduce a new variant of the Frank-Wolfe method that combines Frank-Wolfe steps and steepest descent steps, as well as a novel modification of the Frank-Wolfe gap to measure convergence in the non-convex case. We further extend this method to incorporate in-face directions for preserving structured solutions as well as block coordinate steps, and we demonstrate computational guarantees in terms of the modified Frank-Wolfe gap for all of these variants. We are particularly motivated by the application of this methodology to the training of neural networks with sparse properties, and we apply our block coordinate method to the problem of $ell_1$ regularized neural network training. We present the results of several numerical experiments on both artificial and real datasets demonstrating significant improvements of our method in training sparse neural networks.

Optimization and Control Machine Learning Machine Learning