ﻻ يوجد ملخص باللغة العربية
This paper considers the problem of asynchronous distributed multi-agent optimization on server-based system architecture. In this problem, each agent has a local cost, and the goal for the agents is to collectively find a minimum of their aggregate cost. A standard algorithm to solve this problem is the iterative distributed gradient-descent (DGD) method being implemented collaboratively by the server and the agents. In the synchronous setting, the algorithm proceeds from one iteration to the next only after all the agents complete their expected communication with the server. However, such synchrony can be expensive and even infeasible in real-world applications. We show that waiting for all the agents is unnecessary in many applications of distributed optimization, including distributed machine learning, due to redundancy in the cost functions (or {em data}). Specifically, we consider a generic notion of redundancy named $(r,epsilon)$-redundancy implying solvability of the original multi-agent optimization problem with $epsilon$ accuracy, despite the removal of up to $r$ (out of total $n$) agents from the system. We present an asynchronous DGD algorithm where in each iteration the server only waits for (any) $n-r$ agents, instead of all the $n$ agents. Assuming $(r,epsilon)$-redundancy, we show that our asynchronous algorithm converges to an approximate solution with error that is linear in $epsilon$ and $r$. Moreover, we also present a generalization of our algorithm to tolerate some Byzantine faulty agents in the system. Finally, we demonstrate the improved communication efficiency of our algorithm through experiments on MNIST and Fashion-MNIST using the benchmark neural network LeNet.
As numerous machine learning and other algorithms increase in complexity and data requirements, distributed computing becomes necessary to satisfy the growing computational and storage demands, because it enables parallel execution of smaller tasks t
We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of variance-redu
The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in lock steps,
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not
In this paper, a distributed convex optimization algorithm, termed emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communica