Accelerated Decentralized Dual Averaging

78 0 0.0 ( 0 )

Download Cite

Added by Changxin Liu

Publication date 2020

fields

and research's language is English

Authors Changxin Liu - Yang Shi - Huiping Li

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper studies decentralized convex optimization problems defined over networks, where the objective is to minimize a sum of local smooth convex functions while respecting a common constraint. Two new algorithms based on dual averaging and decentralized consensus-seeking are proposed. The first one accelerates the standard convergence rate $O(frac{1}{sqrt{t}})$ in existing decentralized dual averaging (DDA) algorithms to $O(frac{1}{t})$, where $t$ is the time counter. This is made possible by a second-order consensus scheme that assists each agent to locally track the global dual variable more accurately and a new analysis of the descent property for the mean variable. We remark that, in contrast to its primal counterparts, this method decouples the synchronization step from nonlinear projection, leading to a rather concise analysis and a natural extension to stochastic networks. In the second one, two local sequences of primal variables are constructed in a decentralized manner to achieve acceleration, where only one of them is exchanged between agents. In addition to this, another consensus round is performed for local dual variables. The convergence rate is proved to be $O(1)(frac{1}{t^2}+frac{1}{t})$, where the magnitude of error bound is showed to be inversely proportional to the algebraic connectivity of the graph. However, the condition for stepsize does not rely on the weight matrix associated with the graph, making it easier to satisfy in practice than other accelerated methods. Finally, comparisons between the proposed methods and several recent algorithms are performed using a large-scale LASSO problem.

rate research

Decentralized Composite Optimization in Stochastic Networks: A Dual Averaging Approach with Linear Convergence

148 - Changxin Liu , Zirui Zhou , Jian Pei 2021

Decentralized optimization, particularly the class of decentralized composite convex optimization (DCCO) problems, has found many applications. Due to ubiquitous communication congestion and random dropouts in practice, it is highly desirable to design decentralized algorithms that can handle stochastic communication networks. However, most existing algorithms for DCCO only work in time-invariant networks and cannot be extended to stochastic networks because they inherently need knowledge of network topology $textit{a priori}$. In this paper, we propose a new decentralized dual averaging (DDA) algorithm that can solve DCCO in stochastic networks. Under a rather mild condition on stochastic networks, we show that the proposed algorithm attains $textit{global linear convergence}$ if each local objective function is strongly convex. Our algorithm substantially improves the existing DDA-type algorithms as the latter were only known to converge $textit{sublinearly}$ prior to our work. The key to achieving the improved rate is the design of a novel dynamic averaging consensus protocol for DDA, which intuitively leads to more accurate local estimates of the global dual variable. To the best of our knowledge, this is the first linearly convergent DDA-type decentralized algorithm and also the first algorithm that attains global linear convergence for solving DCCO in stochastic networks. Numerical results are also presented to support our design and analysis.

Optimization and Control Distributed Parallel and Cluster Computing

Decentralized and Parallel Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems

117 - Darina Dvinskikh , Alexander Gasnikov 2019

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems.

Optimization and Control

IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method

478 - Yossi Arjevani , Joan Bruna , Bugra Can 2020

We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method, thereby providing a systematic way for deriving several well-known decentralized algorithms including EXTRA arXiv:1404.6264 and SSDA arXiv:1702.08704. When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds. We provide experimental results that demonstrate the effectiveness of the proposed algorithm on highly ill-conditioned problems.

Optimization and Control Machine Learning

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs

88 - Zhuoqing Song , Lei Shi , Shi Pu 2021

In this work, we consider the decentralized optimization problem in which a network of $n$ agents, each possessing a smooth and convex objective function, wish to collaboratively minimize the average of all the objective functions through peer-to-peer communication in a directed graph. To solve the problem, we propose two accelerated Push-DIGing methods termed APD and APD-SC for minimizing non-strongly convex objective functions and strongly convex ones, respectively. We show that APD and APD-SC respectively converge at the rates $Oleft(frac{1}{k^2}right)$ and $Oleft(left(1 - Csqrt{frac{mu}{L}}right)^kright)$ up to constant factors depending only on the mixing matrix. To the best of our knowledge, APD and APD-SC are the first decentralized methods to achieve provable acceleration over unbalanced directed graphs. Numerical experiments demonstrate the effectiveness of both methods.

Optimization and Control Distributed Parallel and Cluster Computing Machine Learning

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

79 - Huan Li , Zhouchen Lin 2021

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the $O((frac{gamma}{1-sigma_{gamma}})^2sqrt{frac{L}{epsilon}})$ and $O((frac{gamma}{1-sigma_{gamma}})^{1.5}sqrt{frac{L}{mu}}logfrac{1}{epsilon})$ complexities for the practical single loop accelerated gradient tracking over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $gamma$ and $sigma_{gamma}$ are two common constants charactering the network connectivity, $epsilon$ is the desired precision, and $L$ and $mu$ are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of $O(frac{1}{epsilon^{5/7}})$ and $O((frac{L}{mu})^{5/7}frac{1}{(1-sigma)^{1.5}}logfrac{1}{epsilon})$, respectively, which were proved in the original literature only for static graphs, where $frac{1}{1-sigma}$ equals $frac{gamma}{1-sigma_{gamma}}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(frac{gamma}{1-sigma_{gamma}})$ for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.

Optimization and Control Machine Learning

comments

Fetching comments

Mustansiriyah University

Additional details More universities

Accelerated Decentralized Dual Averaging

Ask ChatGPT about the research

No Arabic abstract

Read More