A Topological Perspective on Distributed Network Algorithms

185 0 0.0 ( 0 )

Download Cite

Added by Ami Paz

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Armando Casta~neda - Pierre Fraigniaud - Ami Paz

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

More than two decades ago, combinatorial topology was shown to be useful for analyzing distributed fault-tolerant algorithms in shared memory systems and in message passing systems. In this work, we show that combinatorial topology can also be useful for analyzing distributed algorithms in failure-free networks of arbitrary structure. To illustrate this, we analyze consensus, set-agreement, and approximate agreement in networks, and derive lower bounds for these problems under classical computational settings, such as the LOCAL model and dynamic networks.

rate research

Distributed Training of Deep Learning Models: A Taxonomic Perspective

174 - Matthias Langer , Zhen He , Wenny Rahayu 2020

Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster. Developers of DDLS are required to make many decisions to process their particular workloads in their chosen environment efficiently. The advent of GPU-based deep learning, the ever-increasing size of datasets and deep neural network models, in combination with the bandwidth constraints that exist in cluster environments require developers of DDLS to be innovative in order to train high quality models quickly. Comparing DDLS side-by-side is difficult due to their extensive feature lists and architectural deviations. We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines by analyzing the general properties associated with training deep learning models and how such workloads can be distributed in a cluster to achieve collaborative model training. Thereby we provide an overview of the different techniques that are used by contemporary DDLS and discuss their influence and implications on the training process. To conceptualize and compare DDLS, we group different techniques into categories, thus establishing a taxonomy of distributed deep learning systems.

Distributed Parallel and Cluster Computing Machine Learning

Hemingway: Modeling Distributed Optimization Algorithms

165 - Xinghao Pan , Shivaram Venkataraman , Zizheng Tai 2017

Distributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of optimization algorithms vary with the size of the cluster. In this paper we make the case for an ML-optimizer that can select the appropriate algorithm and cluster size to use for a given problem. To do this we propose building two models: one that captures the system level characteristics of how computation, communication change as we increase cluster sizes and another that captures how convergence rates change with cluster sizes. We present preliminary results from our prototype implementation called Hemingway and discuss some of the challenges involved in developing such a system.

Distributed Parallel and Cluster Computing Artificial Intelligence Machine Learning

Distributed algorithms for fractional coloring

81 - Nicolas Bousquet , Louis Esperet , Franc{c}ois Pirot 2020

In this paper we study fractional coloring from the angle of distributed computing. Fractional coloring is the linear relaxation of the classical notion of coloring, and has many applications, in particular in scheduling. It was proved by Hasemann, Hirvonen, Rybicki and Suomela (2016) that for every real $alpha>1$ and integer $Delta$, a fractional coloring of total weight at most $alpha(Delta+1)$ can be obtained deterministically in a single round in graphs of maximum degree $Delta$, in the LOCAL model of computation. However, a major issue of this result is that the output of each vertex has unbounded size. Here we prove that even if we impose the more realistic assumption that the output of each vertex has constant size, we can find fractional colorings of total weight arbitrarily close to known tight bounds for the fractional chromatic number in several cases of interest. More precisely, we show that for any fixed $epsilon > 0$ and $Delta$, a fractional coloring of total weight at most $Delta+epsilon$ can be found in $O(log^*n)$ rounds in graphs of maximum degree $Delta$ with no $K_{Delta+1}$, while finding a fractional coloring of total weight at most $Delta$ in this case requires $Omega(log log n)$ rounds for randomized algorithms and $Omega( log n)$ rounds for deterministic algorithms. We also show how to obtain fractional colorings of total weight at most $2+epsilon$ in grids of any fixed dimension, for any $epsilon>0$, in $O(log^*n)$ rounds. Finally, we prove that in sparse graphs of large girth from any proper minor-closed family we can find a fractional coloring of total weight at most $2+epsilon$, for any $epsilon>0$, in $O(log n)$ rounds.

Distributed Parallel and Cluster Computing Data Structures and Algorithms Combinatorics

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

348 - Ariful Azad , Oguz Selvitopi , Md Taufique Hussain 2021

Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial BLAS library implements key computational primitives for rapid development of combinatorial algorithms in distributed-memory systems. During the decade since its first introduction, the Combinatorial BLAS library has evolved and expanded significantly. This paper details many of the key technical features of Combinatorial BLAS version 2.0, such as communication avoidance, hierarchical parallelism via in-node multithreading, accelerator support via GPU kernels, generalized semiring support, implementations of key data structures and functions, and scalable distributed I/O operations for human-readable files. Our paper also presents several rules of thumb for choosing the right data structures and functions in Combinatorial BLAS 2.0, under various common application scenarios.

Distributed Parallel and Cluster Computing Discrete Mathematics Performance

Analysis of PCA Algorithms in Distributed Environments

265 - Tarek Elgamal , Mohamed Hefeeda 2015

Classical machine learning algorithms often face scalability bottlenecks when they are applied to large-scale data. Such algorithms were designed to work with small data that is assumed to fit in the memory of one machine. In this report, we analyze different methods for computing an important machine learing algorithm, namely Principal Component Analysis (PCA), and we comment on its limitations in supporting large datasets. The methods are analyzed and compared across two important metrics: time complexity and communication complexity. We consider the worst-case scenarios for both metrics, and we identify the software libraries that implement each method. The analysis in this report helps researchers and engineers in (i) understanding the main bottlenecks for scalability in different PCA algorithms, (ii) choosing the most appropriate method and software library for a given application and data set characteristics, and (iii) designing new scalable PCA algorithms.

Distributed Parallel and Cluster Computing Machine Learning Numerical Analysis