ترغب بنشر مسار تعليمي؟ اضغط هنا

Adaptive Private Distributed Matrix Multiplication

306   0   0.0 ( 0 )
 نشر من قبل Marvin Xhemrishi
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the problem of designing codes with flexible rate (referred to as rateless codes), for private distributed matrix-matrix multiplication. A master server owns two private matrices $mathbf{A}$ and $mathbf{B}$ and hires worker nodes to help computing their multiplication. The matrices should remain information-theoretically private from the workers. Codes with fixed rate require the master to assign tasks to the workers and then wait for a predetermined number of workers to finish their assigned tasks. The size of the tasks, hence the rate of the scheme, depends on the number of workers that the master waits for. We design a rateless private matrix-matrix multiplication scheme, called RPM3. In contrast to fixed-rate schemes, our scheme fixes the size of the tasks and allows the master to send multiple tasks to the workers. The master keeps sending tasks and receiving results until it can decode the multiplication; rendering the scheme flexible and adaptive to heterogeneous environments. Despite resulting in a smaller rate than known straggler-tolerant schemes, RPM3 provides a smaller mean waiting time of the master by leveraging the heterogeneity of the workers. The waiting time is studied under two different models for the workers service time. We provide upper bounds for the mean waiting time under both models. In addition, we provide lower bounds on the mean waiting time under the worker-dependent fixed service time model.



قيم البحث

اقرأ أيضاً

Consider a multi-cell mobile edge computing network, in which each user wishes to compute the product of a user-generated data matrix with a network-stored matrix. This is done through task offloading by means of input uploading, distributed computin g at edge nodes (ENs), and output downloading. Task offloading may suffer long delay since servers at some ENs may be straggling due to random computation time, and wireless channels may experience severe fading and interference. This paper aims to investigate the interplay among upload, computation, and download latencies during the offloading process in the high signal-to-noise ratio regime from an information-theoretic perspective. A policy based on cascaded coded computing and on coordinated and cooperative interference management in uplink and downlink is proposed and proved to be approximately optimal for a sufficiently large upload time. By investing more time in uplink transmission, the policy creates data redundancy at the ENs, which can reduce the computation time, by enabling the use of coded computing, as well as the download time via transmitter cooperation. Moreover, the policy allows computation time to be traded for download time. Numerical examples demonstrate that the proposed policy can improve over existing schemes by significantly reducing the end-to-end execution time.
107 - Lev Tauz , Lara Dolecek 2021
In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Most coded m atrix-matrix computation work has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. Inspired by Cross-Subspace Alignment codes, we develop Flexible Cross-Subspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We theoretically demonstrate that, under certain practical conditions, FCSA codes are within a factor of two of the optimal solution when it comes to straggler resilience; our simulations demonstrate that our codes achieve even better optimality gaps in practice.
We consider the problem of secure distributed matrix multiplication. Coded computation has been shown to be an effective solution in distributed matrix multiplication, both providing privacy against workers and boosting the computation speed by effic iently mitigating stragglers. In this work, we present a non-direct secure extension of the recently introduced bivariate polynomial codes. Bivariate polynomial codes have been shown to be able to further speed up distributed matrix multiplication by exploiting the partial work done by the stragglers rather than completely ignoring them while reducing the upload communication cost and/or the workers storages capacity needs. We show that, especially for upload communication or storage constrained settings, the proposed approach reduces the average computation time of secure distributed matrix multiplication compared to its competitors in the literature.
We study coded distributed matrix multiplication from an approximate recovery viewpoint. We consider a system of $P$ computation nodes where each node stores $1/m$ of each multiplicand via linear encoding. Our main result shows that the matrix produc t can be recovered with $epsilon$ relative error from any $m$ of the $P$ nodes for any $epsilon > 0$. We obtain this result through a careful specialization of MatDot codes -- a class of matrix multiplication codes previously developed in the context of exact recovery ($epsilon=0$). Since prior results showed that MatDot codes achieve the best exact recovery threshold for a class of linear coding schemes, our result shows that allowing for mild approximations leads to a system that is nearly twice as efficient as exact reconstruction. As an additional contribution, we develop an optimization framework based on alternating minimization that enables the discovery of new codes for approximate matrix multiplication.
Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for computation at the agents is affected by the availability of local resources giving rise to the straggler problem in which the computation results are held back by unresponsive agents. For this problem, linear coding of the matrix sub-blocks can be used to introduce resilience toward straggling. The Parameter Server (PS) utilizes a channel code and distributes the matrices to the workers for multiplication. It then produces an approximation to the desired matrix multiplication using the results of the computations received at a given deadline. In this paper, we propose to employ Unequal Error Protection (UEP) codes to alleviate the straggler problem. The resiliency level of each sub-block is chosen according to its norm as blocks with larger norms have higher effects on the result of the matrix multiplication. We validate the effectiveness of our scheme both theoretically and through numerical evaluations. We derive a theoretical characterization of the performance of UEP using random linear codes, and compare it the case of equal error protection. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN), for which we investigate the fundamental trade-off between precision and the time required for the computations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا