ترغب بنشر مسار تعليمي؟ اضغط هنا

Variable Coded Batch Matrix Multiplication

108   0   0.0 ( 0 )
 نشر من قبل Lev Tauz
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Most coded matrix-matrix computation work has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. Inspired by Cross-Subspace Alignment codes, we develop Flexible Cross-Subspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We theoretically demonstrate that, under certain practical conditions, FCSA codes are within a factor of two of the optimal solution when it comes to straggler resilience; our simulations demonstrate that our codes achieve even better optimality gaps in practice.



قيم البحث

اقرأ أيضاً

148 - Jinbao Zhu , Qifa Yan , 2021
This paper investigates the problem of Secure Multi-party Batch Matrix Multiplication (SMBMM), where a user aims to compute the pairwise products $mathbf{A}divideontimesmathbf{B}triangleq(mathbf{A}^{(1)}mathbf{B}^{(1)},ldots,mathbf{A}^{(M)}mathbf{B}^ {(M)})$ of two batch of massive matrices $mathbf{A}$ and $mathbf{B}$ that are generated from two sources, through $N$ honest but curious servers which share some common randomness. The matrices $mathbf{A}$ (resp. $mathbf{B}$) must be kept secure from any subset of up to $X_{mathbf{A}}$ (resp. $X_mathbf{B}$) servers even if they collude, and the user must not obtain any information about $(mathbf{A},mathbf{B})$ beyond the products $mathbf{A}divideontimesmathbf{B}$. A novel computation strategy for single secure matrix multiplication problem (i.e., the case $M=1$) is first proposed, and then is generalized to the strategy for SMBMM by means of cross subspace alignment. The SMBMM strategy focuses on the tradeoff between recovery threshold (the number of successful computing servers that the user needs to wait for), system cost (upload cost, the amount of common randomness, and download cost) and system complexity (encoding, computing, and decoding complexities). Notably, compared with the known result by Chen et al., the strategy for the degraded case $X= X_{mathbf{A}}=X_{mathbf{B}}$ achieves better recovery threshold, amount of common randomness, download cost and decoding complexity when $X$ is less than some parameter threshold, while the performance with respect to other measures remain identical.
We study coded distributed matrix multiplication from an approximate recovery viewpoint. We consider a system of $P$ computation nodes where each node stores $1/m$ of each multiplicand via linear encoding. Our main result shows that the matrix produc t can be recovered with $epsilon$ relative error from any $m$ of the $P$ nodes for any $epsilon > 0$. We obtain this result through a careful specialization of MatDot codes -- a class of matrix multiplication codes previously developed in the context of exact recovery ($epsilon=0$). Since prior results showed that MatDot codes achieve the best exact recovery threshold for a class of linear coding schemes, our result shows that allowing for mild approximations leads to a system that is nearly twice as efficient as exact reconstruction. As an additional contribution, we develop an optimization framework based on alternating minimization that enables the discovery of new codes for approximate matrix multiplication.
Consider a multi-cell mobile edge computing network, in which each user wishes to compute the product of a user-generated data matrix with a network-stored matrix. This is done through task offloading by means of input uploading, distributed computin g at edge nodes (ENs), and output downloading. Task offloading may suffer long delay since servers at some ENs may be straggling due to random computation time, and wireless channels may experience severe fading and interference. This paper aims to investigate the interplay among upload, computation, and download latencies during the offloading process in the high signal-to-noise ratio regime from an information-theoretic perspective. A policy based on cascaded coded computing and on coordinated and cooperative interference management in uplink and downlink is proposed and proved to be approximately optimal for a sufficiently large upload time. By investing more time in uplink transmission, the policy creates data redundancy at the ENs, which can reduce the computation time, by enabling the use of coded computing, as well as the download time via transmitter cooperation. Moreover, the policy allows computation time to be traded for download time. Numerical examples demonstrate that the proposed policy can improve over existing schemes by significantly reducing the end-to-end execution time.
We consider the problem of designing codes with flexible rate (referred to as rateless codes), for private distributed matrix-matrix multiplication. A master server owns two private matrices $mathbf{A}$ and $mathbf{B}$ and hires worker nodes to help computing their multiplication. The matrices should remain information-theoretically private from the workers. Codes with fixed rate require the master to assign tasks to the workers and then wait for a predetermined number of workers to finish their assigned tasks. The size of the tasks, hence the rate of the scheme, depends on the number of workers that the master waits for. We design a rateless private matrix-matrix multiplication scheme, called RPM3. In contrast to fixed-rate schemes, our scheme fixes the size of the tasks and allows the master to send multiple tasks to the workers. The master keeps sending tasks and receiving results until it can decode the multiplication; rendering the scheme flexible and adaptive to heterogeneous environments. Despite resulting in a smaller rate than known straggler-tolerant schemes, RPM3 provides a smaller mean waiting time of the master by leveraging the heterogeneity of the workers. The waiting time is studied under two different models for the workers service time. We provide upper bounds for the mean waiting time under both models. In addition, we provide lower bounds on the mean waiting time under the worker-dependent fixed service time model.
We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes [Fahim. et.al. 2017], that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constrain ts. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strategy is Multiple Random Projection Trees (MRPT), which reduces the set of candidate points over which Euclidean distance calculations are performed. However, this may result in a high memory footprint and possibly paging penalties for large or high-dimensional data. Here we propose two techniques to parallelize MRPT, that exploit data and model parallelism respectively, by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster. This is especially critical when a single compute node cannot hold the complete dataset in memory. We also propose a novel coded computation strategy based on MatDot codes for the model-parallel architecture that, in a straggler-prone environment, achieves the storage-optimal recovery threshold, i.e., the number of nodes that are required to serve a query. We experimentally demonstrate that, in the absence of straggling, our distributed approaches require less query time than execution on a single processing node, providing near-linear speedups with respect to the number of worker nodes. Through our experiments on real systems with simulated straggling, we also show that our strategy achieves a faster query execution than the uncoded strategy in a straggler-prone environment.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا