No Arabic abstract
The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in lock steps, and are failure-free. We show that relaxing these hypotheses to some extent does not hurt local computing. In particular, we show that, for any construction task $T$ associated to a locally checkable labeling (LCL), if $T$ is solvable in $t$ rounds in the LOCAL model, then $T$ remains solvable in $O(t)$ rounds in the asynchronous LOCAL model. This improves the result by Casta~neda et al. [SSS 2016], which was restricted to 3-coloring the rings. More generally, the main contribution of this paper is to show that, perhaps surprisingly, asynchrony and failures in the computations do not restrict the power of the LOCAL model, as long as the communications remain synchronous and failure-free.
This paper considers the problem of asynchronous distributed multi-agent optimization on server-based system architecture. In this problem, each agent has a local cost, and the goal for the agents is to collectively find a minimum of their aggregate cost. A standard algorithm to solve this problem is the iterative distributed gradient-descent (DGD) method being implemented collaboratively by the server and the agents. In the synchronous setting, the algorithm proceeds from one iteration to the next only after all the agents complete their expected communication with the server. However, such synchrony can be expensive and even infeasible in real-world applications. We show that waiting for all the agents is unnecessary in many applications of distributed optimization, including distributed machine learning, due to redundancy in the cost functions (or {em data}). Specifically, we consider a generic notion of redundancy named $(r,epsilon)$-redundancy implying solvability of the original multi-agent optimization problem with $epsilon$ accuracy, despite the removal of up to $r$ (out of total $n$) agents from the system. We present an asynchronous DGD algorithm where in each iteration the server only waits for (any) $n-r$ agents, instead of all the $n$ agents. Assuming $(r,epsilon)$-redundancy, we show that our asynchronous algorithm converges to an approximate solution with error that is linear in $epsilon$ and $r$. Moreover, we also present a generalization of our algorithm to tolerate some Byzantine faulty agents in the system. Finally, we demonstrate the improved communication efficiency of our algorithm through experiments on MNIST and Fashion-MNIST using the benchmark neural network LeNet.
We give a protocol for Asynchronous Distributed Key Generation (A-DKG) that is optimally resilient (can withstand $f<frac{n}{3}$ faulty parties), has a constant expected number of rounds, has $tilde{O}(n^3)$ expected communication complexity, and assumes only the existence of a PKI. Prior to our work, the best A-DKG protocols required $Omega(n)$ expected number of rounds, and $Omega(n^4)$ expected communication. Our A-DKG protocol relies on several building blocks that are of independent interest. We define and design a Proposal Election (PE) protocol that allows parties to retrospectively agree on a valid proposal after enough proposals have been sent from different parties. With constant probability the elected proposal was proposed by a non-faulty party. In building our PE protocol, we design a Verifiable Gather protocol which allows parties to communicate which proposals they have and have not seen in a verifiable manner. The final building block to our A-DKG is a Validated Asynchronous Byzantine Agreement (VABA) protocol. We use our PE protocol to construct a VABA protocol that does not require leaders or an asynchronous DKG setup. Our VABA protocol can be used more generally when it is not possible to use threshold signatures.
This paper shows for the first time that distributed computing can be both reliable and efficient in an environment that is both highly dynamic and hostile. More specifically, we show how to maintain clusters of size $O(log N)$, each containing more than two thirds of honest nodes with high probability, within a system whose size can vary textit{polynomially} with respect to its initial size. Furthermore, the communication cost induced by each node arrival or departure is polylogarithmic with respect to $N$, the maximal size of the system. Our clustering can be achieved despite the presence of a Byzantine adversary controlling a fraction $bad leq {1}{3}-epsilon$ of the nodes, for some fixed constant $epsilon > 0$, independent of $N$. So far, such a clustering could only be performed for systems who size can vary constantly and it was not clear whether that was at all possible for polynomial variances.
Modeling distributed computing in a way enabling the use of formal methods is a challenge that has been approached from different angles, among which two techniques emerged at the turn of the century: protocol complexes, and directed algebraic topology. In both cases, the considered computational model generally assumes communication via shared objects, typically a shared memory consisting of a collection of read-write registers. Our paper is concerned with network computing, where the processes are located at the nodes of a network, and communicate by exchanging messages along the edges of that network. Applying the topological approach for verification in network computing is a considerable challenge, mainly because the presence of identifiers assigned to the nodes yields protocol complexes whose size grows exponentially with the size of the underlying network. However, many of the problems studied in this context are of local nature, and their definitions do not depend on the identifiers or on the size of the network. We leverage this independence in order to meet the above challenge, and present $textit{local}$ protocol complexes, whose sizes do not depend on the size of the network. As an application of the design of compact protocol complexes, we reformulate the celebrated lower bound of $Omega(log^*n)$ rounds for 3-coloring the $n$-node ring, in the algebraic topology framework.
Consensus protocols for asynchronous networks are usually complex and inefficient, leading practical systems to rely on synchronous protocols. This paper attempts to simplify asynchronous consensus by building atop a novel threshold logical clock abstraction, which enables upper layers to operate as if on a synchronous network. This approach yields an asynchronous consensus protocol for fail-stop nodes that may be simpler and more robust than Paxos and its leader-based variants, requiring no common coins and achieving consensus in a constant expected number of rounds. The same approach can be strengthened against Byzantine failures by building on well-established techniques such as tamper-evident logging and gossip, accountable state machines, threshold signatures and witness cosigning, and verifiable secret sharing. This combination of existing abstractions and threshold logical clocks yields a modular, cleanly-layered approach to building practical and efficient Byzantine consensus, distributed key generation, time, timestamping, and randomness beacons, and other critical services.