No Arabic abstract
The popular BFGS quasi-Newton minimization algorithm under reasonable conditions converges globally on smooth convex functions. This result was proved by Powell in 1976: we consider its implications for functions that are not smooth. In particular, an analogous convergence result holds for functions, like the Euclidean norm, that are nonsmooth at the minimizer.
Given a separation oracle $mathsf{SO}$ for a convex function $f$ that has an integral minimizer inside a box with radius $R$, we show how to find an exact minimizer of $f$ using at most (a) $O(n (n + log(R)))$ calls to $mathsf{SO}$ and $mathsf{poly}(n, log(R))$ arithmetic operations, or (b) $O(n log(nR))$ calls to $mathsf{SO}$ and $exp(n) cdot mathsf{poly}(log(R))$ arithmetic operations. When the set of minimizers of $f$ has integral extreme points, our algorithm outputs an integral minimizer of $f$. This improves upon the previously best oracle complexity of $O(n^2 (n + log(R)))$ for polynomial time algorithms obtained by [Grotschel, Lovasz and Schrijver, Prog. Comb. Opt. 1984, Springer 1988] over thirty years ago. For the Submodular Function Minimization problem, our result immediately implies a strongly polynomial algorithm that makes at most $O(n^3)$ calls to an evaluation oracle, and an exponential time algorithm that makes at most $O(n^2 log(n))$ calls to an evaluation oracle. These improve upon the previously best $O(n^3 log^2(n))$ oracle complexity for strongly polynomial algorithms given in [Lee, Sidford and Wong, FOCS 2015] and [Dadush, Vegh and Zambelli, SODA 2018], and an exponential time algorithm with oracle complexity $O(n^3 log(n))$ given in the former work. Our result is achieved via a reduction to the Shortest Vector Problem in lattices. We show how an approximately shortest vector of certain lattice can be used to effectively reduce the dimension of the problem. Our analysis of the oracle complexity is based on a potential function that captures simultaneously the size of the search set and the density of the lattice, which we analyze via technical tools from convex geometry.
In this paper, we introduce a new class of nonsmooth convex functions called SOS-convex semialgebraic functions extending the recently proposed notion of SOS-convex polynomials. This class of nonsmooth convex functions covers many common nonsmooth functions arising in the applications such as the Euclidean norm, the maximum eigenvalue function and the least squares functions with $ell_1$-regularization or elastic net regularization used in statistics and compressed sensing. We show that, under commonly used strict feasibility conditions, the optimal value and an optimal solution of SOS-convex semi-algebraic programs can be found by solving a single semi-definite programming problem (SDP). We achieve the results by using tools from semi-algebraic geometry, convex-concave minimax theorem and a recently established Jensen inequality type result for SOS-convex polynomials. As an application, we outline how the derived results can be applied to show that robust SOS-convex optimization problems under restricted spectrahedron data uncertainty enjoy exact SDP relaxations. This extends the existing exact SDP relaxation result for restricted ellipsoidal data uncertainty and answers the open questions left in [Optimization Letters 9, 1-18(2015)] on how to recover a robust solution from the semi-definite programming relaxation in this broader setting.
We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $kleq r$.
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $epsilon$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(delta, epsilon)$-stationarity, which allows for an $epsilon$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $delta$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(delta, epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $delta$. Empirically, our methods perform well for training ReLU neural networks.
We present a unified convergence analysis for first order convex optimization methods using the concept of strong Lyapunov conditions. Combining this with suitable time scaling factors, we are able to handle both convex and strong convex cases, and establish contraction properties of Lyapunov functions for many existing ordinary differential equation models. Then we derive prevailing first order optimization algorithms, such as proximal gradient methods, heavy ball methods (also known as momentum methods), Nesterov accelerated gradient methods, and accelerated proximal gradient methods from numerical discretizations of corresponding dynamical systems. We also apply strong Lyapunov conditions to the discrete level and provide a more systematical analysis framework. Another contribution is a novel second order dynamical system called Hessian-driven Nesterov accelerated gradient flow which can be used to design and analyze accelerated first order methods for smooth and non-smooth convex optimizations.