ﻻ يوجد ملخص باللغة العربية
Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with emph{arbitrary} data placement, with $s in mathbb{N}$ stragglers and $a in mathbb{N}$ adversarial nodes. In particular, we show that the optimum communication cost, normalized by the size of the gradient vectors, is equal to $(r-s-2a)^{-1}$, where $r in mathbb{N}$ is the minimum number that a data partition is replicated. In other words, the communication cost is determined by the data partition with the minimum replication, irrespective of the structure of the placement. The proposed achievable scheme also allows us to target the computation of a polynomial function of the aggregated gradient matrix. It also allows us to borrow some ideas from approximation computing and propose an approximate gradient coding scheme for the cases when the repetition in data placement is smaller than what is needed to meet the restriction imposed on communication cost or when the number of stragglers appears to be more than the presumed value in the system design.
The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based servi
In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers. These are no
A central issue of distributed computing systems is how to optimally allocate computing and storage resources and design data shuffling strategies such that the total execution time for computing and data shuffling is minimized. This is extremely cri
Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called stragglers, and communication overheads.
A major hurdle in machine learning is scalability to massive datasets. One approach to overcoming this is to distribute the computational tasks among several workers. textit{Gradient coding} has been recently proposed in distributed optimization to c