Distributed Coupled Multi-Agent Stochastic Optimization

116 0 0.0 ( 0 )

Download Cite

Added by Sulaiman Alghunaim Mr.

Publication date 2017

fields

and research's language is English

Authors Sulaiman A. Alghunaim - Ali H. Sayed

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This work develops effective distributed strategies for the solution of constrained multi-agent stochastic optimization problems with coupled parameters across the agents. In this formulation, each agent is influenced by only a subset of the entries of a global parameter vector or model, and is subject to convex constraints that are only known locally. Problems of this type arise in several applications, most notably in disease propagation models, minimum-cost flow problems, distributed control formulations, and distributed power system monitoring. This work focuses on stochastic settings, where a stochastic risk function is associated with each agent and the objective is to seek the minimizer of the aggregate sum of all risks subject to a set of constraints. Agents are not aware of the statistical distribution of the data and, therefore, can only rely on stochastic approximations in their learning strategies. We derive an effective distributed learning strategy that is able to track drifts in the underlying parameter model. A detailed performance and stability analysis is carried out showing that the resulting coupled diffusion strategy converges at a linear rate to an $O(mu)-$neighborhood of the true penalized optimizer.

rate research

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

98 - Alireza Fallah , Mert Gurbuzbalaban , Asuman Ozdaglar 2019

We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are available. We develop a framework which allows to choose the stepsize and the momentum parameters of these algorithms in a way to optimize performance by systematically trading off the bias, variance, robustness to gradient noise and dependence to network effects. When gradients do not contain noise, we also prove that distributed accelerated methods can emph{achieve acceleration}, requiring $mathcal{O}(kappa log(1/varepsilon))$ gradient evaluations and $mathcal{O}(kappa log(1/varepsilon))$ communications to converge to the same fixed point with the non-accelerated variant where $kappa$ is the condition number and $varepsilon$ is the target accuracy. To our knowledge, this is the first acceleration result where the iteration complexity scales with the square root of the condition number in the context of emph{primal} distributed inexact first-order methods. For quadratic functions, we also provide finer performance bounds that are tight with respect to bias and variance terms. Finally, we study a multistage version of D-ASG with parameters carefully varied over stages to ensure exact $mathcal{O}(-k/sqrt{kappa})$ linear decay in the bias term as well as optimal $mathcal{O}(sigma^2/k)$ in the variance term. We illustrate through numerical experiments that our approach results in practical algorithms that are robust to gradient noise and that can outperform existing methods.

Optimization and Control Machine Learning Machine Learning

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

88 - Sulaiman A. Alghunaim , Ming Yan , Ali H. Sayed 2020

This work studies multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents. This scenario arises in many machine learning and engineering applications, such as regression over distributed features and resource allocation. We reformulate this problem into an equivalent saddle-point problem, which is amenable to decentralized solutions. We then propose a proximal primal-dual algorithm and establish its linear convergence to the optimal solution when the local functions are strongly-convex. To our knowledge, this is the first linearly convergent decentralized algorithm for multi-agent sharing problems with a general convex (possibly non-smooth) coupling function.

Optimization and Control Machine Learning

A unitary distributed subgradient method for multi-agent optimization with different coupling sources

64 - Changxin Liu , Huiping Li , Yang Shi 2018

In this work, we first consider distributed convex constrained optimization problems where the objective function is encoded by multiple local and possibly nonsmooth objectives privately held by a group of agents, and propose a distributed subgradient method with double averaging (abbreviated as ${rm DSA_2}$) that only requires peer-to-peer communication and local computation to solve the global problem. The algorithmic framework builds on dual methods and dynamic average consensus; the sequence of test points is formed by iteratively minimizing a local dual model of the overall objective where the coefficients, i.e., approximated subgradients of the objective, are supplied by the dynamic average consensus scheme. We theoretically show that ${rm DSA_2}$ enjoys non-ergodic convergence properties, i.e., the local minimizing sequence itself is convergent, a distinct feature that cannot be found in existing results. Specifically, we establish a convergence rate of $O(frac{1}{sqrt{t}})$ in terms of objective function error. Then, extensions are made to tackle distributed optimization problems with coupled functional constraints by combining ${rm DSA_2}$ and dual decomposition. This is made possible by Lagrangian relaxation that transforms the coupling in constraints of the primal problem into that in cost functions of the dual, thus allowing us to solve the dual problem via ${rm DSA_2}$. Both the dual objective error and the quadratic penalty for the coupled constraint are proved to converge at a rate of $O(frac{1}{sqrt{t}})$, and the primal objective error asymptotically vanishes. Numerical experiments and comparisons are conducted to illustrate the advantage of the proposed algorithms and validate our theoretical findings.

Optimization and Control

Distributed stochastic optimization with large delays

80 - Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos andn Peter W. Glynn 2021

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithms step-size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with probability $1$ under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale non-convex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design.

Optimization and Control Machine Learning

On Data Dependence in Distributed Stochastic Optimization

162 - Avleen S. Bijral , Anand D. Sarwate , Nathan Srebro 2016

We study a distributed consensus-based stochastic gradient descent (SGD) algorithm and show that the rate of convergence involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network topology and a new term depending on the spectral norm of the sample covariance matrix of the data. This data-dependent convergence rate shows that distributed SGD algorithms perform better on datasets with small spectral norm. Our analysis method also allows us to find data-dependent convergence rates as we limit the amount of communication. Spreading a fixed amount of data across more nodes slows convergence; for asymptotically growing data sets we show that adding more machines can help when minimizing twice-differentiable losses.

Optimization and Control