Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

80 0 0.0 ( 0 )

Download Cite

Added by Huan Li

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Huan Li - Zhouchen Lin

Optimization and Control Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the $O((frac{gamma}{1-sigma_{gamma}})^2sqrt{frac{L}{epsilon}})$ and $O((frac{gamma}{1-sigma_{gamma}})^{1.5}sqrt{frac{L}{mu}}logfrac{1}{epsilon})$ complexities for the practical single loop accelerated gradient tracking over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $gamma$ and $sigma_{gamma}$ are two common constants charactering the network connectivity, $epsilon$ is the desired precision, and $L$ and $mu$ are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of $O(frac{1}{epsilon^{5/7}})$ and $O((frac{L}{mu})^{5/7}frac{1}{(1-sigma)^{1.5}}logfrac{1}{epsilon})$, respectively, which were proved in the original literature only for static graphs, where $frac{1}{1-sigma}$ equals $frac{gamma}{1-sigma_{gamma}}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(frac{gamma}{1-sigma_{gamma}})$ for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.

rate research

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs

88 - Zhuoqing Song , Lei Shi , Shi Pu 2021

In this work, we consider the decentralized optimization problem in which a network of $n$ agents, each possessing a smooth and convex objective function, wish to collaboratively minimize the average of all the objective functions through peer-to-peer communication in a directed graph. To solve the problem, we propose two accelerated Push-DIGing methods termed APD and APD-SC for minimizing non-strongly convex objective functions and strongly convex ones, respectively. We show that APD and APD-SC respectively converge at the rates $Oleft(frac{1}{k^2}right)$ and $Oleft(left(1 - Csqrt{frac{mu}{L}}right)^kright)$ up to constant factors depending only on the mixing matrix. To the best of our knowledge, APD and APD-SC are the first decentralized methods to achieve provable acceleration over unbalanced directed graphs. Numerical experiments demonstrate the effectiveness of both methods.

Optimization and Control Distributed Parallel and Cluster Computing Machine Learning

Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks

371 - Zhuoqing Song , Lei Shi , Shi Pu 2021

In this paper, we propose two communication-efficient algorithms for decentralized optimization over a multi-agent network with general directed network topology. In the first part, we consider a novel communication-efficient gradient tracking based method, termed Compressed Push-Pull (CPP), which combines the Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence for strongly convex and smooth objective functions. In the second part, we propose a broadcast-like version of CPP (B-CPP), which also achieves linear convergence rate under the same conditions for the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods.

Optimization and Control Distributed Parallel and Cluster Computing Machine Learning

Distributed Regularized Dual Gradient Algorithm for Constrained Convex Optimization over Time-Varying Directed Graphs

101 - Chuanye Gu , Zhiyou Wu , Jueyou Li 2018

We investigate a distributed optimization problem over a cooperative multi-agent time-varying network, where each agent has its own decision variables that should be set so as to minimize its individual objective subject to local constraints and global coupling constraints. Based on push-sum protocol and dual decomposition, we design a distributed regularized dual gradient algorithm to solve this problem, in which the algorithm is implemented in time-varying directed graphs only requiring the column stochasticity of communication matrices. By augmenting the corresponding Lagrangian function with a quadratic regularization term, we first obtain the bound of the Lagrangian multipliers which does not require constructing a compact set containing the dual optimal set when compared with most of primal-dual based methods. Then, we obtain that the convergence rate of the proposed method can achieve the order of $mathcal{O}(ln T/T)$ for strongly convex objective functions, where $T$ is the iterations. Moreover, the explicit bound of constraint violations is also given. Finally, numerical results on the network utility maximum problem are used to demonstrate the efficiency of the proposed algorithm.

Optimization and Control

A geometrically converging dual method for distributed optimization over time-varying graphs

156 - Marie Maros , Joakim Jalden 2018

In this paper we consider a distributed convex optimization problem over time-varying undirected networks. We propose a dual method, primarily averaged network dual ascent (PANDA), that is proven to converge R-linearly to the optimal point given that the agents objective functions are strongly convex and have Lipschitz continuous gradients. Like dual decomposition, PANDA requires half the amount of variable exchanges per iterate of methods based on DIGing, and can provide with practical improved performance as empirically demonstrated.

Optimization and Control

Compressed Gradient Tracking Methods for Decentralized Optimization with Linear Convergence

184 - Yiwei Liao , Zhuorui Li , Kun Huang 2021

Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network using only local computation and peer-to-peer communication. In this paper, we first propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. In the second part of this paper, we propose an error feedback based compressed gradient tracking algorithm (EF-C-GT) to further improve the algorithm efficiency for biased compression operators. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithms.

Optimization and Control Distributed Parallel and Cluster Computing Multiagent Systems

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions