On Dynamic Distributed Computing

354 0 0.0 ( 0 )

Download Cite

Added by Florian Huc

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Rachid Guerraoui - Florian Huc - Anne-Marie Kermarrec

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper shows for the first time that distributed computing can be both reliable and efficient in an environment that is both highly dynamic and hostile. More specifically, we show how to maintain clusters of size $O(log N)$, each containing more than two thirds of honest nodes with high probability, within a system whose size can vary textit{polynomially} with respect to its initial size. Furthermore, the communication cost induced by each node arrival or departure is polylogarithmic with respect to $N$, the maximal size of the system. Our clustering can be achieved despite the presence of a Byzantine adversary controlling a fraction $bad leq {1}{3}-epsilon$ of the nodes, for some fixed constant $epsilon > 0$, independent of $N$. So far, such a clustering could only be performed for systems who size can vary constantly and it was not clear whether that was at all possible for polynomial variances.

rate research

A Survey of Coded Distributed Computing

165 - Jer Shyuan Ng , Wei Yang Bryan Lim , Nguyen Cong Luong 2020

Distributed computing has become a common approach for large-scale computation of tasks due to benefits such as high reliability, scalability, computation speed, and costeffectiveness. However, distributed computing faces critical issues related to communication load and straggler effects. In particular, computing nodes need to exchange intermediate results with each other in order to calculate the final result, and this significantly increases communication overheads. Furthermore, a distributed computing network may include straggling nodes that run intermittently slower. This results in a longer overall time needed to execute the computation tasks, thereby limiting the performance of distributed computing. To address these issues, coded distributed computing (CDC), i.e., a combination of coding theoretic techniques and distributed computing, has been recently proposed as a promising solution. Coding theoretic techniques have proved effective in WiFi and cellular systems to deal with channel noise. Therefore, CDC may significantly reduce communication load, alleviate the effects of stragglers, provide fault-tolerance, privacy and security. In this survey, we first introduce the fundamentals of CDC, followed by basic CDC schemes. Then, we review and analyze a number of CDC approaches proposed to reduce the communication costs, mitigate the straggler effects, and guarantee privacy and security. Furthermore, we present and discuss applications of CDC in modern computer networks. Finally, we highlight important challenges and promising research directions related to CDC

Distributed Parallel and Cluster Computing

Distributed Computing in the Asynchronous LOCAL model

154 - Carole Delporte-Gallet , Hugues Fauconnier , Pierre Fraigniaud 2019

The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in lock steps, and are failure-free. We show that relaxing these hypotheses to some extent does not hurt local computing. In particular, we show that, for any construction task $T$ associated to a locally checkable labeling (LCL), if $T$ is solvable in $t$ rounds in the LOCAL model, then $T$ remains solvable in $O(t)$ rounds in the asynchronous LOCAL model. This improves the result by Casta~neda et al. [SSS 2016], which was restricted to 3-coloring the rings. More generally, the main contribution of this paper is to show that, perhaps surprisingly, asynchrony and failures in the computations do not restrict the power of the LOCAL model, as long as the communications remain synchronous and failure-free.

Distributed Parallel and Cluster Computing

Efficient Replication for Straggler Mitigation in Distributed Computing

158 - Amir Behrouzi-Far , Emina Soljanin 2020

Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. Tasks are grouped into batches and assigned to one or more workers for execution. We first consider the case when the batches do not overlap and, using the results from majorization theory, show that, for a general class of workers service time distributions, a balanced assignment of batches to workers minimizes the average job compute time. We next show that this balanced assignment of non-overlapping batches achieves lower average job compute time compared to the overlapping schemes proposed in the literature. Furthermore, we derive the optimum redundancy level as a function of the service time distribution at workers. We show that the redundancy level that minimizes average job compute time is not necessarily the same as the redundancy level that maximizes the predictability of job compute time, and thus there exists a trade-off between optimizing the two metrics. Finally, by running experiments on Google cluster traces, we observe that redundancy can reduce the compute time of the jobs in Google clusters by an order of magnitude, and that the optimum level of redundancy depends on the distribution of tasks service time.

Distributed Parallel and Cluster Computing Information Theory Performance

Hardness of minimal symmetry breaking in distributed computing

89 - Alkida Balliu , Juho Hirvonen , Dennis Olivetti 2018

A graph is weakly $2$-colored if the nodes are labeled with colors black and white such that each black node is adjacent to at least one white node and vice versa. In this work we study the distributed computational complexity of weak $2$-coloring in the standard LOCAL model of distributed computing, and how it is related to the distributed computational complexity of other graph problems. First, we show that weak $2$-coloring is a minimal distributed symmetry-breaking problem for regular even-degree trees and high-girth graphs: if there is any non-trivial locally checkable labeling problem that is solvable in $o(log^* n)$ rounds with a distributed graph algorithm in the middle of a regular even-degree tree, then weak $2$-coloring is also solvable in $o(log^* n)$ rounds there. Second, we prove a tight lower bound of $Omega(log^* n)$ for the distributed computational complexity of weak $2$-coloring in regular trees; previously only a lower bound of $Omega(log log^* n)$ was known. By minimality, the same lower bound holds for any non-trivial locally checkable problem inside regular even-degree trees.

Distributed Parallel and Cluster Computing Computational Complexity

The Topology of Randomized Symmetry-Breaking Distributed Computing

77 - Pierre Fraigniaud , Ran Gelles , Zvi Lotker 2021

Studying distributed computing through the lens of algebraic topology has been the source of many significant breakthroughs during the last two decades, especially in the design of lower bounds or impossibility results for deterministic algorithms. This paper aims at studying randomized synchronous distributed computing through the lens of algebraic topology. We do so by studying the wide class of (input-free) symmetry-breaking tasks, e.g., leader election, in synchronous fault-free anonymous systems. We show that it is possible to redefine solvability of a task locally, i.e., for each simplex of the protocol complex individually, without requiring any global consistency. However, this approach has a drawback: it eliminates the topological aspect of the computation, since a single facet has a trivial topological structure. To overcome this issue, we introduce a projection $pi$ of both protocol and output complexes, where every simplex $sigma$ is mapped to a complex $pi(sigma)$; the later has a rich structure that replaces the structure we lost by considering one single facet at a time. To show the significance and applicability of our topological approach, we derive necessary and sufficient conditions for solving leader election in synchronous fault-free anonymous shared-memory and message-passing models. In both models, we consider scenarios in which there might be correlations between the random values provided to the nodes. In particular, different parties might have access to the same randomness source so their randomness is not independent but equal. Interestingly, we find that solvability of leader election relates to the number of parties that possess correlated randomness, either directly or via their greatest common divisor, depending on the specific communication model.

Distributed Parallel and Cluster Computing Data Structures and Algorithms