No Arabic abstract
We investigate a distributed optimization problem over a cooperative multi-agent time-varying network, where each agent has its own decision variables that should be set so as to minimize its individual objective subject to local constraints and global coupling constraints. Based on push-sum protocol and dual decomposition, we design a distributed regularized dual gradient algorithm to solve this problem, in which the algorithm is implemented in time-varying directed graphs only requiring the column stochasticity of communication matrices. By augmenting the corresponding Lagrangian function with a quadratic regularization term, we first obtain the bound of the Lagrangian multipliers which does not require constructing a compact set containing the dual optimal set when compared with most of primal-dual based methods. Then, we obtain that the convergence rate of the proposed method can achieve the order of $mathcal{O}(ln T/T)$ for strongly convex objective functions, where $T$ is the iterations. Moreover, the explicit bound of constraint violations is also given. Finally, numerical results on the network utility maximum problem are used to demonstrate the efficiency of the proposed algorithm.
This paper considers a distributed convex optimization problem over a time-varying multi-agent network, where each agent has its own decision variables that should be set so as to minimize its individual objective subject to local constraints and global coupling equality constraints. Over directed graphs, a distributed algorithm is proposed that incorporates the push-sum protocol into dual subgradient methods. Under the convexity assumption, the optimality of primal and dual variables, and constraint violations is first established. Then the explicit convergence rates of the proposed algorithm are obtained. Finally, some numerical experiments on the economic dispatch problem are provided to demonstrate the efficacy of the proposed algorithm.
In this paper we consider a distributed convex optimization problem over time-varying undirected networks. We propose a dual method, primarily averaged network dual ascent (PANDA), that is proven to converge R-linearly to the optimal point given that the agents objective functions are strongly convex and have Lipschitz continuous gradients. Like dual decomposition, PANDA requires half the amount of variable exchanges per iterate of methods based on DIGing, and can provide with practical improved performance as empirically demonstrated.
In this paper we consider a distributed convex optimization problem over time-varying networks. We propose a dual method that converges R-linearly to the optimal point given that the agents objective functions are strongly convex and have Lipschitz continuous gradients. The proposed method requires half the amount of variable exchanges per iterate than methods based on DIGing, and yields improved practical performance as empirically demonstrated.
Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the $O((frac{gamma}{1-sigma_{gamma}})^2sqrt{frac{L}{epsilon}})$ and $O((frac{gamma}{1-sigma_{gamma}})^{1.5}sqrt{frac{L}{mu}}logfrac{1}{epsilon})$ complexities for the practical single loop accelerated gradient tracking over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $gamma$ and $sigma_{gamma}$ are two common constants charactering the network connectivity, $epsilon$ is the desired precision, and $L$ and $mu$ are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of $O(frac{1}{epsilon^{5/7}})$ and $O((frac{L}{mu})^{5/7}frac{1}{(1-sigma)^{1.5}}logfrac{1}{epsilon})$, respectively, which were proved in the original literature only for static graphs, where $frac{1}{1-sigma}$ equals $frac{gamma}{1-sigma_{gamma}}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(frac{gamma}{1-sigma_{gamma}})$ for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.
This paper investigates accelerating the convergence of distributed optimization algorithms on non-convex problems. We propose a distributed primal-dual stochastic gradient descent~(SGD) equipped with powerball method to accelerate. We show that the proposed algorithm achieves the linear speedup convergence rate $mathcal{O}(1/sqrt{nT})$ for general smooth (possibly non-convex) cost functions. We demonstrate the efficiency of the algorithm through numerical experiments by training two-layer fully connected neural networks and convolutional neural networks on the MNIST dataset to compare with state-of-the-art distributed SGD algorithms and centralized SGD algorithms.