ﻻ يوجد ملخص باللغة العربية
This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designed for a fixed number of stragglers, we developed a new scheme called Adaptive Gradient Coding(AGC) with flexible tolerance of various number of stragglers. Our scheme gives an optimal tradeoff between computation load, straggler tolerance and communication cost. In particular, it allows to minimize the communication cost according to the real-time number of stragglers in the practical environments. Implementations on Amazon EC2 clusters using Python with mpi4py package verify the flexibility in several situations.
In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers. These are no
Batched network coding is a low-complexity network coding solution to feedbackless multi-hop wireless packet network transmission with packet loss. The data to be transmitted is encoded into batches where each of which consists of a few coded packets
We consider a wireless communication network with an adaptive scheme to select the number of packets to be admitted and encoded for each transmission, and characterize the information timeliness. For a network of erasure channels and discrete time, w
A major hurdle in machine learning is scalability to massive datasets. One approach to overcoming this is to distribute the computational tasks among several workers. textit{Gradient coding} has been recently proposed in distributed optimization to c
We propose a novel adaptive and causal random linear network coding (AC-RLNC) algorithm with forward error correction (FEC) for a point-to-point communication channel with delayed feedback. AC-RLNC is adaptive to the channel condition, that the algor