Network-accelerated Distributed Machine Learning Using MLFabric

153 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Raajay Viswanathan

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Raajay Viswanathan - Aditya Akella

النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Existing distributed machine learning (DML) systems focus on improving the computational efficiency of distributed learning, whereas communication aspects have received less attention. Many DML systems treat the network as a blackbox. Thus, DML algorithms performance is impeded by network bottlenecks, and DML systems end up sacrificing important algorithmic and system-level benefits. We present MLfabric, a communication library that manages all network transfers in a DML system, and holistically determines the communication pattern of a DML algorithm at any point in time. This allows MLfabric to carefully order transfers (i.e., gradient updates) to improve convergence, opportunistically aggregate updates in-network to improve efficiency, and proactively replicate some of them to support new notions of fault tolerance. We empirically find that MLfabric achieves up to 3X speed-up in training large deep learning models in realistic dynamic cluster settings.

قيم البحث

95 - Kangwook Lee , Maximilian Lam , Ramtin Pedarsani 2015

Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems there are several types of noise that can affect the performance of distributed machine learning algorithms -- straggler nodes, system fa ilures, or communication bottlenecks -- but there has been little interaction cutting across codes, machine learning, and distributed systems. In this work, we provide theoretical insights on how coded solutions can achieve significant gains compared to uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling. For matrix multiplication, we use codes to alleviate the effect of stragglers, and show that if the number of homogeneous workers is $n$, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of $log n$. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction $alpha$ of the data matrix can be cached at each worker, and $n$ is the number of workers, emph{coded shuffling} reduces the communication cost by a factor of $(alpha + frac{1}{n})gamma(n)$ compared to uncoded shuffling, where $gamma(n)$ is the ratio of the cost of unicasting $n$ messages to $n$ users to multicasting a common message (of the same size) to $n$ users. For instance, $gamma(n) simeq n$ if multicasting a message to $n$ users is as cheap as unicasting a message to one user. We also provide experiment results, corroborating our theoretical gains of the coded algorithms.

النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات التعلم الآلي

Promoting Distributed Trust in Machine Learning and Computational Simulation via a Blockchain Network

241 - Nelson Kibichii Bore , Ravi Kiran Raman , Isaac M. Markus 2018

Policy decisions are increasingly dependent on the outcomes of simulations and/or machine learning models. The ability to share and interact with these outcomes is relevant across multiple fields and is especially critical in the disease modeling com munity where models are often only accessible and workable to the researchers that generate them. This work presents a blockchain-enabled system that establishes a decentralized trust between parties involved in a modeling process. Utilizing the OpenMalaria framework, we demonstrate the ability to store, share and maintain auditable logs and records of each step in the simulation process, showing how to validate results generated by computing workers. We also show how the system monitors worker outputs to rank and identify faulty workers via comparison to nearest neighbors or historical reward spaces as a means of ensuring model quality.

النظم الموزعة والتوازية والحوسبة العنقودية

From Distributed Machine Learning to Federated Learning: A Survey

364 - Ji Liu , Jizhou Huang , Yang Zhou 2021

In recent years, data and computing resources are typically distributed in the devices of end users, various regions or organizations. Because of laws or regulations, the distributed data and computing resources cannot be directly shared among differ ent regions or organizations for machine learning tasks. Federated learning emerges as an efficient approach to exploit distributed data and computing resources, so as to collaboratively train machine learning models, while obeying the laws and regulations and ensuring data security and data privacy. In this paper, we provide a comprehensive survey of existing works for federated learning. We propose a functional architecture of federated learning systems and a taxonomy of related techniques. Furthermore, we present the distributed training, data communication, and security of FL systems. Finally, we analyze their limitations and propose future research directions.

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي

Distributed Double Machine Learning with a Serverless Architecture

109 - Malte S. Kurz 2021

This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It a llows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype Python implementation texttt{DoubleML-Serverless} for the estimation of double machine learning models with the serverless computing platform AWS Lambda and demonstrate its utility with a case study analyzing estimation times and costs.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي التعلم الالي

Accelerated Distributed Laplacian Solvers via Shortcuts

61 - Ioannis Anagnostides , Themis Gouleakis , Christoph Lenzen 2021

In this work we refine the analysis of the distributed Laplacian solver recently established by Forster, Goranci, Liu, Peng, Sun, and Ye (FOCS 21), via the Ghaffari-Haeupler framework (SODA 16) of low-congestion shortcuts. Specifically, if $epsilon > 0$ represents the error of the solver, we derive two main results. First, for any $n$-node graph $G$ with hop-diameter $D$ and treewidth bounded by $k$, we show the existence of a Laplacian solver with round complexity $O(n^{o(1)}kD log(1/epsilon))$ in the CONGEST model. For graphs with bounded treewidth this circumvents the notorious $Omega(sqrt{n})$ lower bound for global problems in general graphs. Moreover, following a recent line of work in distributed algorithms, we consider a hybrid communication model which enhances CONGEST with very limited global power in the form of the recently introduced node-capacitated clique. In this model, we show the existence of a Laplacian solver with round complexity $O(n^{o(1)} log(1/epsilon))$. The unifying thread of these results is an application of accelerated distributed algorithms for a congested variant of the standard part-wise aggregation problem that we introduce. This primitive constitutes the primary building block for simulating local operations on low-congestion minors, and we believe that this framework could be more generally applicable.

النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات