No Arabic abstract
Decentralized optimization, particularly the class of decentralized composite convex optimization (DCCO) problems, has found many applications. Due to ubiquitous communication congestion and random dropouts in practice, it is highly desirable to design decentralized algorithms that can handle stochastic communication networks. However, most existing algorithms for DCCO only work in time-invariant networks and cannot be extended to stochastic networks because they inherently need knowledge of network topology $textit{a priori}$. In this paper, we propose a new decentralized dual averaging (DDA) algorithm that can solve DCCO in stochastic networks. Under a rather mild condition on stochastic networks, we show that the proposed algorithm attains $textit{global linear convergence}$ if each local objective function is strongly convex. Our algorithm substantially improves the existing DDA-type algorithms as the latter were only known to converge $textit{sublinearly}$ prior to our work. The key to achieving the improved rate is the design of a novel dynamic averaging consensus protocol for DDA, which intuitively leads to more accurate local estimates of the global dual variable. To the best of our knowledge, this is the first linearly convergent DDA-type decentralized algorithm and also the first algorithm that attains global linear convergence for solving DCCO in stochastic networks. Numerical results are also presented to support our design and analysis.
Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network using only local computation and peer-to-peer communication. In this paper, we first propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. In the second part of this paper, we propose an error feedback based compressed gradient tracking algorithm (EF-C-GT) to further improve the algorithm efficiency for biased compression operators. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithms.
The present paper considers leveraging network topology information to improve the convergence rate of ADMM for decentralized optimization, where networked nodes work collaboratively to minimize the objective. Such problems can be solved efficiently using ADMM via decomposing the objective into easier subproblems. Properly exploiting network topology can significantly improve the algorithm performance. Hybrid ADMM explores the direction of exploiting node information by taking into account node centrality but fails to utilize edge information. This paper fills the gap by incorporating both node and edge information and provides a novel convergence rate bound for decentralized ADMM that explicitly depends on network topology. Such a novel bound is attainable for certain class of problems, thus tight. The explicit dependence further suggests possible directions to optimal design of edge weights to achieve the best performance. Numerical experiments show that simple heuristic methods could achieve better performance, and also exhibits robustness to topology changes.
This paper studies decentralized convex optimization problems defined over networks, where the objective is to minimize a sum of local smooth convex functions while respecting a common constraint. Two new algorithms based on dual averaging and decentralized consensus-seeking are proposed. The first one accelerates the standard convergence rate $O(frac{1}{sqrt{t}})$ in existing decentralized dual averaging (DDA) algorithms to $O(frac{1}{t})$, where $t$ is the time counter. This is made possible by a second-order consensus scheme that assists each agent to locally track the global dual variable more accurately and a new analysis of the descent property for the mean variable. We remark that, in contrast to its primal counterparts, this method decouples the synchronization step from nonlinear projection, leading to a rather concise analysis and a natural extension to stochastic networks. In the second one, two local sequences of primal variables are constructed in a decentralized manner to achieve acceleration, where only one of them is exchanged between agents. In addition to this, another consensus round is performed for local dual variables. The convergence rate is proved to be $O(1)(frac{1}{t^2}+frac{1}{t})$, where the magnitude of error bound is showed to be inversely proportional to the algebraic connectivity of the graph. However, the condition for stepsize does not rely on the weight matrix associated with the graph, making it easier to satisfy in practice than other accelerated methods. Finally, comparisons between the proposed methods and several recent algorithms are performed using a large-scale LASSO problem.
Decentralized optimization and communication compression have exhibited their great potential in accelerating distributed machine learning by mitigating the communication bottleneck in practice. While existing decentralized algorithms with communication compression mostly focus on the problems with only smooth components, we study the decentralized stochastic composite optimization problem with a potentially non-smooth component. A underline{Prox}imal gradient underline{L}inunderline{EA}r convergent underline{D}ecentralized algorithm with compression, Prox-LEAD, is proposed with rigorous theoretical analyses in the general stochastic setting and the finite-sum setting. Our theorems indicate that Prox-LEAD works with arbitrary compression precision, and it tremendously reduces the communication cost almost for free. The superiorities of the proposed algorithms are demonstrated through the comparison with state-of-the-art algorithms in terms of convergence complexities and numerical experiments. Our algorithmic framework also generally enlightens the compressed communication on other primal-dual algorithms by reducing the impact of inexact iterations, which might be of independent interest.
The present work introduces the hybrid consensus alternating direction method of multipliers (H-CADMM), a novel framework for optimization over networks which unifies existing distributed optimization approaches, including the centralized and the decentralized consensus ADMM. H-CADMM provides a flexible tool that leverages the underlying graph topology in order to achieve a desirable sweet-spot between node-to-node communication overhead and rate of convergence -- thereby alleviating known limitations of both C-CADMM and D-CADMM. A rigorous analysis of the novel method establishes linear convergence rate, and also guides the choice of parameters to optimize this rate. The novel hybrid update rules of H-CADMM lend themselves to in-network acceleration that is shown to effect considerable -- and essentially free-of-charge -- performance boost over the fully decentralized ADMM. Comprehensive numerical tests validate the analysis and showcase the potential of the method in tackling efficiently, widely useful learning tasks.