No Arabic abstract
Mirror descent (MD) is a powerful first-order optimization technique that subsumes several optimization algorithms including gradient descent (GD). In this work, we study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems. We view MD with a dynamical system lens and leverage quadratic constraints (QCs) to provide convergence guarantees based on the Lyapunov stability. For centralized MD, we establish a semi-definite programming (SDP) that certifies exponentially fast convergence of MD subject to a linear matrix inequality (LMI). We prove that the SDP always has a feasible solution that recovers the optimal GD rate. Next, we analyze the exponential convergence of distributed MD and characterize the rate using two LMIs. To the best of our knowledge, the exact (exponential) rate of distributed MD has not been previously explored in the literature. We present numerical results as a verification of our theory and observe that the richness of the Lyapunov function entails better (worst-case) convergence rates compared to existing works on distributed GD.
Many large-scale and distributed optimization problems can be brought into a composite form in which the objective function is given by the sum of a smooth term and a nonsmooth regularizer. Such problems can be solved via a proximal gradient method and its variants, thereby generalizing gradient descent to a nonsmooth setup. In this paper, we view proximal algorithms as dynamical systems and leverage techniques from control theory to study their global properties. In particular, for problems with strongly convex objective functions, we utilize the theory of integral quadratic constraints to prove the global exponential stability of the equilibrium points of the differential equations that govern the evolution of proximal gradient and Douglas-Rachford splitting flows. In our analysis, we use the fact that these algorithms can be interpreted as variable-metric gradient methods on the suitable envelopes and exploit structural properties of the nonlinear terms that arise from the gradient of the smooth part of the objective function and the proximal operator associated with the nonsmooth regularizer. We also demonstrate that these envelopes can be obtained from the augmented Lagrangian associated with the original nonsmooth problem and establish conditions for global exponential convergence even in the absence of strong convexity.
This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information exchange. In this paper, we propose a ZO distributed primal-dual coordinate method (ZODIAC) to solve the stochastic optimization problem. Agents approximate their own local stochastic ZO oracle along with coordinates with an adaptive smoothing parameter. We show that the proposed algorithm achieves the convergence rate of $mathcal{O}(sqrt{p}/sqrt{T})$ for general nonconvex cost functions. We demonstrate the efficiency of proposed algorithms through a numerical example in comparison with the existing state-of-the-art centralized and distributed ZO algorithms.
This work addresses decentralized online optimization in non-stationary environments. A network of agents aim to track the minimizer of a global time-varying convex function. The minimizer evolves according to a known dynamics corrupted by an unknown, unstructured noise. At each time, the global function can be cast as a sum of a finite number of local functions, each of which is assigned to one agent in the network. Moreover, the local functions become available to agents sequentially, and agents do not have a prior knowledge of the future cost functions. Therefore, agents must communicate with each other to build an online approximation of the global function. We propose a decentralized variation of the celebrated Mirror Descent, developed by Nemirovksi and Yudin. Using the notion of Bregman divergence in lieu of Euclidean distance for projection, Mirror Descent has been shown to be a powerful tool in large-scale optimization. Our algorithm builds on Mirror Descent, while ensuring that agents perform a consensus step to follow the global function and take into account the dynamics of the global minimizer. To measure the performance of the proposed online algorithm, we compare it to its offline counterpart, where the global functions are available a priori. The gap between the two is called dynamic regret. We establish a regret bound that scales inversely in the spectral gap of the network, and more notably it represents the deviation of minimizer sequence with respect to the given dynamics. We then show that our results subsume a number of results in distributed optimization. We demonstrate the application of our method to decentralized tracking of dynamic parameters and verify the results via numerical experiments.
Convergence of the gradient descent algorithm has been attracting renewed interest due to its utility in deep learning applications. Even as multiple variants of gradient descent were proposed, the assumption that the gradient of the objective is Lipschitz continuous remained an integral part of the analysis until recently. In this work, we look at convergence analysis by focusing on a property that we term as concavifiability, instead of Lipschitz continuity of gradients. We show that concavifiability is a necessary and sufficient condition to satisfy the upper quadratic approximation which is key in proving that the objective function decreases after every gradient descent update. We also show that any gradient Lipschitz function satisfies concavifiability. A constant known as the concavifier analogous to the gradient Lipschitz constant is derived which is indicative of the optimal step size. As an application, we demonstrate the utility of finding the concavifier the in convergence of gradient descent through an example inspired by neural networks. We derive bounds on the concavifier to obtain a fixed step size for a single hidden layer ReLU network.
We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound.