No Arabic abstract
In this paper, we aim to solve a distributed optimization problem with coupling constraints based on proximal gradient method in a multi-agent network, where the cost function of the agents is composed of smooth and possibly non-smooth parts. To solve this problem, we resort to the dual problem by deriving the Fenchel conjugate, resulting in a consensus based constrained optimization problem. Then, we propose a fully distributed dual proximal gradient algorithm, where the agents make decisions only with local parameters and the information of immediate neighbours. Moreover, provided that the non-smooth parts in the primal cost functions are with some simple structures, we only need to update dual variables by some simple operations and the overall computational complexity can be reduced. Analytical convergence rate of the proposed algorithm is derived and the efficacy is numerically verified by a social welfare optimization problem in the electricity market.
In this work, we first consider distributed convex constrained optimization problems where the objective function is encoded by multiple local and possibly nonsmooth objectives privately held by a group of agents, and propose a distributed subgradient method with double averaging (abbreviated as ${rm DSA_2}$) that only requires peer-to-peer communication and local computation to solve the global problem. The algorithmic framework builds on dual methods and dynamic average consensus; the sequence of test points is formed by iteratively minimizing a local dual model of the overall objective where the coefficients, i.e., approximated subgradients of the objective, are supplied by the dynamic average consensus scheme. We theoretically show that ${rm DSA_2}$ enjoys non-ergodic convergence properties, i.e., the local minimizing sequence itself is convergent, a distinct feature that cannot be found in existing results. Specifically, we establish a convergence rate of $O(frac{1}{sqrt{t}})$ in terms of objective function error. Then, extensions are made to tackle distributed optimization problems with coupled functional constraints by combining ${rm DSA_2}$ and dual decomposition. This is made possible by Lagrangian relaxation that transforms the coupling in constraints of the primal problem into that in cost functions of the dual, thus allowing us to solve the dual problem via ${rm DSA_2}$. Both the dual objective error and the quadratic penalty for the coupled constraint are proved to converge at a rate of $O(frac{1}{sqrt{t}})$, and the primal objective error asymptotically vanishes. Numerical experiments and comparisons are conducted to illustrate the advantage of the proposed algorithms and validate our theoretical findings.
This work develops a proximal primal-dual decentralized strategy for multi-agent optimization problems that involve multiple coupled affine constraints, where each constraint may involve only a subset of the agents. The constraints are generally sparse, meaning that only a small subset of the agents are involved in them. This scenario arises in many applications including decentralized control formulations, resource allocation problems, and smart grids. Traditional decentralized solutions tend to ignore the structure of the constraints and lead to degraded performance. We instead develop a decentralized solution that exploits the sparsity structure. Under constant step-size learning, the asymptotic convergence of the proposed algorithm is established in the presence of non-smooth terms, and it occurs at a linear rate in the smooth case. We also examine how the performance of the algorithm is influenced by the sparsity of the constraints. Simulations illustrate the superior performance of the proposed strategy.
Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.
We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are available. We develop a framework which allows to choose the stepsize and the momentum parameters of these algorithms in a way to optimize performance by systematically trading off the bias, variance, robustness to gradient noise and dependence to network effects. When gradients do not contain noise, we also prove that distributed accelerated methods can emph{achieve acceleration}, requiring $mathcal{O}(kappa log(1/varepsilon))$ gradient evaluations and $mathcal{O}(kappa log(1/varepsilon))$ communications to converge to the same fixed point with the non-accelerated variant where $kappa$ is the condition number and $varepsilon$ is the target accuracy. To our knowledge, this is the first acceleration result where the iteration complexity scales with the square root of the condition number in the context of emph{primal} distributed inexact first-order methods. For quadratic functions, we also provide finer performance bounds that are tight with respect to bias and variance terms. Finally, we study a multistage version of D-ASG with parameters carefully varied over stages to ensure exact $mathcal{O}(-k/sqrt{kappa})$ linear decay in the bias term as well as optimal $mathcal{O}(sigma^2/k)$ in the variance term. We illustrate through numerical experiments that our approach results in practical algorithms that are robust to gradient noise and that can outperform existing methods.
This paper considers a distributed convex optimization problem over a time-varying multi-agent network, where each agent has its own decision variables that should be set so as to minimize its individual objective subject to local constraints and global coupling equality constraints. Over directed graphs, a distributed algorithm is proposed that incorporates the push-sum protocol into dual subgradient methods. Under the convexity assumption, the optimality of primal and dual variables, and constraint violations is first established. Then the explicit convergence rates of the proposed algorithm are obtained. Finally, some numerical experiments on the economic dispatch problem are provided to demonstrate the efficacy of the proposed algorithm.