We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish loca
This paper considers the analysis of continuous time gradient-based optimization algorithms through the lens of nonlinear contraction theory. It demonstrates that in the case of a time-invariant objective, most elementary results on gradient descent based on convexity can be replaced by much more general results based on contraction. In particular, gradient descent converges to a unique equilibrium if its dynamics are contracting in any metric, with convexity of the cost corresponding to the special case of contraction in the identity metric. More broadly, contraction analysis provides new insights for the case of geodesically-convex optimization, wherein non-convex problems in Euclidean space can be transformed to convex ones posed over a Riemannian manifold. In this case, natural gradient descent converges to a unique equilibrium if it is contracting in any metric, with geodesic convexity of the cost corresponding to contraction in the natural metric. New results using semi-contraction provide additional insights into the topology of the set of optimizers in the case when multiple optima exist. Furthermore, they show how semi-contraction may be combined with specific additional information to reach broad conclusions about a dynamical system. The contraction perspective also easily extends to time-varying optimization settings and allows one to recursively build large optimization structures out of simpler elements. Extensions to natural primal-dual optimization and game-theoretic contexts further illustrate the potential reach of these new perspectives.
We present a generalization of the notion of neighborliness to non-polyhedral convex cones. Although a definition of neighborliness is available in the non-polyhedral case in the literature, it is fairly restrictive as it requires all the low-dimensional faces to be polyhedral. Our approach is more flexible and includes, for example, the cone of positive-semidefinite matrices as a special case (this cone is not neighborly in general). We term our generalization Terracini convexity due to its conceptual similarity with the conclusion of Terracinis lemma from algebraic geometry. Polyhedral cones are Terracini convex if and only if they are neighborly. More broadly, we derive many families of non-polyhedral Terracini convex cones based on neighborly cones, linear images of cones of positive semidefinite matrices, and derivative relaxations of Terracini convex hyperbolicity cones. As a demonstration of the utility of our framework in the non-polyhedral case, we give a characterization based on Terracini convexity of the tightness of semidefinite relaxations for certain inverse problems.
In this paper we characterise the maximal convex subsets of the (non-convex) rate region in 802.11 WLANs. In addition to being of intrinsic interest as a fundamental property of 802.11 WLANs, this characterisation can be exploited to allow the wealth of convex optimisation approaches to be applied to 802.11 WLANs.
Convergence of the gradient descent algorithm has been attracting renewed interest due to its utility in deep learning applications. Even as multiple variants of gradient descent were proposed, the assumption that the gradient of the objective is Lipschitz continuous remained an integral part of the analysis until recently. In this work, we look at convergence analysis by focusing on a property that we term as concavifiability, instead of Lipschitz continuity of gradients. We show that concavifiability is a necessary and sufficient condition to satisfy the upper quadratic approximation which is key in proving that the objective function decreases after every gradient descent update. We also show that any gradient Lipschitz function satisfies concavifiability. A constant known as the concavifier analogous to the gradient Lipschitz constant is derived which is indicative of the optimal step size. As an application, we demonstrate the utility of finding the concavifier the in convergence of gradient descent through an example inspired by neural networks. We derive bounds on the concavifier to obtain a fixed step size for a single hidden layer ReLU network.
We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength $lambda > 1$ (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a finite number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy. We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any $lambda > 1$, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large $lambda$. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit. Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.