ترغب بنشر مسار تعليمي؟ اضغط هنا

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

80   0   0.0 ( 0 )
 نشر من قبل Huan Li
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the $O((frac{gamma}{1-sigma_{gamma}})^2sqrt{frac{L}{epsilon}})$ and $O((frac{gamma}{1-sigma_{gamma}})^{1.5}sqrt{frac{L}{mu}}logfrac{1}{epsilon})$ complexities for the practical single loop accelerated gradient tracking over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $gamma$ and $sigma_{gamma}$ are two common constants charactering the network connectivity, $epsilon$ is the desired precision, and $L$ and $mu$ are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of $O(frac{1}{epsilon^{5/7}})$ and $O((frac{L}{mu})^{5/7}frac{1}{(1-sigma)^{1.5}}logfrac{1}{epsilon})$, respectively, which were proved in the original literature only for static graphs, where $frac{1}{1-sigma}$ equals $frac{gamma}{1-sigma_{gamma}}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(frac{gamma}{1-sigma_{gamma}})$ for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.



قيم البحث

اقرأ أيضاً

88 - Zhuoqing Song , Lei Shi , Shi Pu 2021
In this work, we consider the decentralized optimization problem in which a network of $n$ agents, each possessing a smooth and convex objective function, wish to collaboratively minimize the average of all the objective functions through peer-to-pee r communication in a directed graph. To solve the problem, we propose two accelerated Push-DIGing methods termed APD and APD-SC for minimizing non-strongly convex objective functions and strongly convex ones, respectively. We show that APD and APD-SC respectively converge at the rates $Oleft(frac{1}{k^2}right)$ and $Oleft(left(1 - Csqrt{frac{mu}{L}}right)^kright)$ up to constant factors depending only on the mixing matrix. To the best of our knowledge, APD and APD-SC are the first decentralized methods to achieve provable acceleration over unbalanced directed graphs. Numerical experiments demonstrate the effectiveness of both methods.
371 - Zhuoqing Song , Lei Shi , Shi Pu 2021
In this paper, we propose two communication-efficient algorithms for decentralized optimization over a multi-agent network with general directed network topology. In the first part, we consider a novel communication-efficient gradient tracking based method, termed Compressed Push-Pull (CPP), which combines the Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence for strongly convex and smooth objective functions. In the second part, we propose a broadcast-like version of CPP (B-CPP), which also achieves linear convergence rate under the same conditions for the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods.
101 - Chuanye Gu , Zhiyou Wu , Jueyou Li 2018
We investigate a distributed optimization problem over a cooperative multi-agent time-varying network, where each agent has its own decision variables that should be set so as to minimize its individual objective subject to local constraints and glob al coupling constraints. Based on push-sum protocol and dual decomposition, we design a distributed regularized dual gradient algorithm to solve this problem, in which the algorithm is implemented in time-varying directed graphs only requiring the column stochasticity of communication matrices. By augmenting the corresponding Lagrangian function with a quadratic regularization term, we first obtain the bound of the Lagrangian multipliers which does not require constructing a compact set containing the dual optimal set when compared with most of primal-dual based methods. Then, we obtain that the convergence rate of the proposed method can achieve the order of $mathcal{O}(ln T/T)$ for strongly convex objective functions, where $T$ is the iterations. Moreover, the explicit bound of constraint violations is also given. Finally, numerical results on the network utility maximum problem are used to demonstrate the efficiency of the proposed algorithm.
156 - Marie Maros , Joakim Jalden 2018
In this paper we consider a distributed convex optimization problem over time-varying undirected networks. We propose a dual method, primarily averaged network dual ascent (PANDA), that is proven to converge R-linearly to the optimal point given that the agents objective functions are strongly convex and have Lipschitz continuous gradients. Like dual decomposition, PANDA requires half the amount of variable exchanges per iterate of methods based on DIGing, and can provide with practical improved performance as empirically demonstrated.
Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network usi ng only local computation and peer-to-peer communication. In this paper, we first propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. In the second part of this paper, we propose an error feedback based compressed gradient tracking algorithm (EF-C-GT) to further improve the algorithm efficiency for biased compression operators. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithms.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا