بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

$epsilon$-Approximate Coded Matrix Multiplication is Nearly Twice as Efficient as Exact Multiplication

107 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Viveck Cadambe

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Haewon Jeong - Ateet Devulapalli - Viveck R. Cadambe

نظرية المعلومات نظرية المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study coded distributed matrix multiplication from an approximate recovery viewpoint. We consider a system of $P$ computation nodes where each node stores $1/m$ of each multiplicand via linear encoding. Our main result shows that the matrix product can be recovered with $epsilon$ relative error from any $m$ of the $P$ nodes for any $epsilon > 0$. We obtain this result through a careful specialization of MatDot codes -- a class of matrix multiplication codes previously developed in the context of exact recovery ($epsilon=0$). Since prior results showed that MatDot codes achieve the best exact recovery threshold for a class of linear coding schemes, our result shows that allowing for mild approximations leads to a system that is nearly twice as efficient as exact reconstruction. As an additional contribution, we develop an optimization framework based on alternating minimization that enables the discovery of new codes for approximate matrix multiplication.

قيم البحث

107 - Lev Tauz , Lara Dolecek 2021

In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Most coded m atrix-matrix computation work has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. Inspired by Cross-Subspace Alignment codes, we develop Flexible Cross-Subspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We theoretically demonstrate that, under certain practical conditions, FCSA codes are within a factor of two of the optimal solution when it comes to straggler resilience; our simulations demonstrate that our codes achieve even better optimality gaps in practice.

نظرية المعلومات نظرية المعلومات

Coded Computing and Cooperative Transmission for Wireless Distributed Matrix Multiplication

142 - Kuikui Li , Meixia Tao , Jingjing Zhang 2020

Consider a multi-cell mobile edge computing network, in which each user wishes to compute the product of a user-generated data matrix with a network-stored matrix. This is done through task offloading by means of input uploading, distributed computin g at edge nodes (ENs), and output downloading. Task offloading may suffer long delay since servers at some ENs may be straggling due to random computation time, and wireless channels may experience severe fading and interference. This paper aims to investigate the interplay among upload, computation, and download latencies during the offloading process in the high signal-to-noise ratio regime from an information-theoretic perspective. A policy based on cascaded coded computing and on coordinated and cooperative interference management in uplink and downlink is proposed and proved to be approximately optimal for a sufficiently large upload time. By investing more time in uplink transmission, the policy creates data redundancy at the ENs, which can reduce the computation time, by enabling the use of coded computing, as well as the download time via transmitter cooperation. Moreover, the policy allows computation time to be traded for download time. Numerical examples demonstrate that the proposed policy can improve over existing schemes by significantly reducing the end-to-end execution time.

نظرية المعلومات نظرية المعلومات

Adaptive Private Distributed Matrix Multiplication

305 - Rawad Bitar , Marvin Xhemrishi , Antonia Wachter-Zeh 2021

We consider the problem of designing codes with flexible rate (referred to as rateless codes), for private distributed matrix-matrix multiplication. A master server owns two private matrices $mathbf{A}$ and $mathbf{B}$ and hires worker nodes to help computing their multiplication. The matrices should remain information-theoretically private from the workers. Codes with fixed rate require the master to assign tasks to the workers and then wait for a predetermined number of workers to finish their assigned tasks. The size of the tasks, hence the rate of the scheme, depends on the number of workers that the master waits for. We design a rateless private matrix-matrix multiplication scheme, called RPM3. In contrast to fixed-rate schemes, our scheme fixes the size of the tasks and allows the master to send multiple tasks to the workers. The master keeps sending tasks and receiving results until it can decode the multiplication; rendering the scheme flexible and adaptive to heterogeneous environments. Despite resulting in a smaller rate than known straggler-tolerant schemes, RPM3 provides a smaller mean waiting time of the master by leveraging the heterogeneity of the workers. The waiting time is studied under two different models for the workers service time. We provide upper bounds for the mean waiting time under both models. In addition, we provide lower bounds on the mean waiting time under the worker-dependent fixed service time model.

نظرية المعلومات نظرية المعلومات

Improved Constructions for Secure Multi-Party Batch Matrix Multiplication

148 - Jinbao Zhu , Qifa Yan , 2021

This paper investigates the problem of Secure Multi-party Batch Matrix Multiplication (SMBMM), where a user aims to compute the pairwise products $mathbf{A}divideontimesmathbf{B}triangleq(mathbf{A}^{(1)}mathbf{B}^{(1)},ldots,mathbf{A}^{(M)}mathbf{B}^ {(M)})$ of two batch of massive matrices $mathbf{A}$ and $mathbf{B}$ that are generated from two sources, through $N$ honest but curious servers which share some common randomness. The matrices $mathbf{A}$ (resp. $mathbf{B}$) must be kept secure from any subset of up to $X_{mathbf{A}}$ (resp. $X_mathbf{B}$) servers even if they collude, and the user must not obtain any information about $(mathbf{A},mathbf{B})$ beyond the products $mathbf{A}divideontimesmathbf{B}$. A novel computation strategy for single secure matrix multiplication problem (i.e., the case $M=1$) is first proposed, and then is generalized to the strategy for SMBMM by means of cross subspace alignment. The SMBMM strategy focuses on the tradeoff between recovery threshold (the number of successful computing servers that the user needs to wait for), system cost (upload cost, the amount of common randomness, and download cost) and system complexity (encoding, computing, and decoding complexities). Notably, compared with the known result by Chen et al., the strategy for the degraded case $X= X_{mathbf{A}}=X_{mathbf{B}}$ achieves better recovery threshold, amount of common randomness, download cost and decoding complexity when $X$ is less than some parameter threshold, while the performance with respect to other measures remain identical.

نظرية المعلومات نظرية المعلومات

Accelerating Sparse Approximate Matrix Multiplication on GPUs

256 - Xiaoyan Liu , Yi Liu , Ming Dun 2021

Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the algorithms to fill the performance gap neglected by traditional optimizations for dense/sparse matrix multiplication. However, existing SpAMM algorithms fail to exploit the performance potential of GPUs for acceleration. In this paper, we present cuSpAMM, the first parallel SpAMM algorithm optimized for multiple GPUs. Several performance optimizations have been proposed, including algorithm re-design to adapt to the thread parallelism, blocking strategies for memory access optimization, and the acceleration with the tensor core. In addition, we scale cuSpAMM to run on multiple GPUs with an effective load balance scheme. We evaluate cuSpAMM on both synthesized and real-world datasets on multiple GPUs. The experiment results show that cuSpAMM achieves significant performance speedup compared to vendor optimized cuBLAS and cuSPARSE libraries.

الأداء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الإسلامية في لبنان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

$epsilon$-Approximate Coded Matrix Multiplication is Nearly Twice as Efficient as Exact Multiplication

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً