No Arabic abstract
We present a new version of Peregrine, the tool for the analysis and parameterized verification of population protocols introduced in [Blondin et al., CAV2018]. Population protocols are a model of computation, intensely studied by the distributed computing community, in which mobile anonymous agents interact stochastically to perform a task. Peregrine 2.0 features a novel verification engine based on the construction of stage graphs. Stage graphs are proof certificates, introduced in [Blondin et al., CAV2020], that are typically succinct and can be independently checked. Moreover, unlike the techniques of Peregrine 1.0, the stage graph methodology can verify protocols whose executions never terminate, a class including recent fast majority protocols. Peregrine 2.0 also features a novel proof visualization component that allows the user to interactively explore the stage graph generated for a given protocol.
Let $G$ be a graph on $n$ nodes. In the stochastic population protocol model, a collection of $n$ indistinguishable, resource-limited nodes collectively solve tasks via pairwise interactions. In each interaction, two randomly chosen neighbors first read each others states, and then update their local states. A rich line of research has established tight upper and lower bounds on the complexity of fundamental tasks, such as majority and leader election, in this model, when $G$ is a clique. Specifically, in the clique, these tasks can be solved fast, i.e., in $n operatorname{polylog} n$ pairwise interactions, with high probability, using at most $operatorname{polylog} n$ states per node. In this work, we consider the more general setting where $G$ is an arbitrary graph, and present a technique for simulating protocols designed for fully-connected networks in any connected regular graph. Our main result is a simulation that is efficient on many interesting graph families: roughly, the simulation overhead is polylogarithmic in the number of nodes, and quadratic in the conductance of the graph. As a sample application, we show that, in any regular graph with conductance $phi$, both leader election and exact majority can be solved in $phi^{-2} cdot n operatorname{polylog} n$ pairwise interactions, with high probability, using at most $phi^{-2} cdot operatorname{polylog} n$ states per node. This shows that there are fast and space-efficient population protocols for leader election and exact majority on graphs with good expansion properties. We believe our results will prove generally useful, as they allow efficient technology transfer between the well-mixed (clique) case, and the under-explored spatial setting.
Over the years, population protocols with the goal of reaching consensus have been studied in great depth. However, many systems in the real-world do not result in all agents eventually reaching consensus, but rather in the opposite: they converge to a state of rich diversity. Consider for example task allocation in ants. If eventually all ants perform the same task, then the colony will perish (lack of food, no brood care, etc.). Then, it is vital for the survival of the colony to have a diverse set of tasks and enough ants working on each task. What complicates matters is that ants need to switch tasks periodically to adjust the needs of the colony; e.g., when too many foragers fell victim to other ant colonies. Moreover, all tasks are equally important and maybe they need to keep certain proportions in the distribution of the task. How can ants keep a healthy and balanced allocation of tasks? To answer this question, we propose a simple population protocol for $n$ agents on a complete graph and an arbitrary initial distribution of $k$ colours (tasks). We assume that each colour $i$ has an associated weight (importance) $w_i geq 1$. By denoting $w$ as the sum of the weights of different colours, we show that the protocol converges in $O(w^2 n log n)$ rounds to a configuration where the number of agents supporting each colour $i$ is concentrated on the fair share $w_in/w$ and will stay concentrated for a large number of rounds, w.h.p. Our protocol has many interesting properties: agents do not need to know other colours and weights in the system, and our protocol requires very little memory per agent. Furthermore, the protocol guarantees fairness meaning that over a long period each agent has each colour roughly a number of times proportional to the weight of the colour. Finally, our protocol also fulfils sustainability meaning that no colour ever vanishes.
Maintaining leadership in HPC requires the ability to support simulations at large scales and fidelity. In this study, we detail one of the most significant productivity challenges in achieving this goal, namely the increasing proclivity to bugs, especially in the face of growing hardware and software heterogeneity and sheer system scale. We identify key areas where timely new research must be proactively begun to address these challenges, and create new correctness tools that must ideally play a significant role even while ramping up toward exacale. We close with the proposal for a two-day workshop in which the problems identified in this report can be more broadly discussed, and specific plans to launch these new research thrusts identified.
In this paper, we develop RCC, the first unified and comprehensive RDMA-enabled distributed transaction processing framework supporting six serializable concurrency control protocols: not only the classical protocols NOWAIT, WAITDIE, and OCC, but also more advanced MVCC and SUNDIAL, and even CALVIN, the deterministic concurrency control protocol. Our goal is to unbiasedly compare the protocols in a common execution environment with the concurrency control protocol being the only changeable component. We focus on the correct and efficient implementation using key techniques, such as co-routines, outstanding requests, and doorbell batching, with two-sided and one-sided communication primitives. Based on RCC, we get the deep insights that cannot be obtained by any existing systems. Most importantly, we obtain the execution stage latency breakdowns with one-sided and two-sided primitive for each protocol, which are analyzed to develop more efficient hybrid implementations. Our results show that three hybrid designs are indeed better than both one-sided and two-sided implementations by up to 17.8%. We believe that RCC is a significant advance over the state-of-the-art; it can both provide performance insights and be used as the common infrastructure for fast prototyping new implementations.
We introduce FnF-BFT, a parallel-leader byzantine fault-tolerant state-machine replication protocol for the partially synchronous model with theoretical performance bounds during synchrony. By allowing all replicas to act as leaders and propose requests independently, FnF-BFT parallelizes the execution of requests. Leader parallelization distributes the load over the entire network -- increasing throughput by overcoming the single-leader bottleneck. We further use historical data to ensure that well-performing replicas are in command. FnF-BFTs communication complexity is linear in the number of replicas during synchrony and thus competitive with state-of-the-art protocols. Finally, with FnF-BFT, we introduce a BFT protocol with performance guarantees in stable network conditions under truly byzantine attacks. A prototype implementation of prot outperforms (state-of-the-art) HotStuffs throughput, especially as replicas increase, showcasing prots significantly improved scaling capabilities.