No Arabic abstract
Approximate agreement is one of the few variants of consensus that can be solved in a wait-free manner in asynchronous systems where processes communicate by reading and writing to shared memory. In this work, we consider a natural generalisation of approximate agreement on arbitrary undirected connected graphs. Each process is given a vertex of the graph as input and, if non-faulty, must output a vertex such that - all the outputs are within distance 1 of one another, and - each output value lies on a shortest path between two input values. From prior work, it is known that there is no wait-free algorithm among $n ge 3$ processes for this problem on any cycle of length $c ge 4$, by reduction from 2-set agreement (Casta~neda et al., 2018). In this work, we investigate the solvability and complexity of this task on general graphs. We give a new, direct proof of the impossibility of approximate agreement on cycles of length $c ge 4$, via a generalisation of Sperners Lemma to convex polygons. We also extend the reduction from 2-set agreement to a larger class of graphs, showing that approximate agreement on on these graphs is unsolvable. Furthermore, we show that combinatorial arguments, used by both existing proofs, are necessary, by showing that the impossibility of a wait-free algorithm in the nonuniform iterated snapshot model cannot be proved via an extension-based proof. On the positive side, we present a wait-free algorithm for a class of graphs that properly contains the class of chordal graphs.
Consider a distributed system with $n$ processors out of which $f$ can be Byzantine faulty. In the approximate agreement task, each processor $i$ receives an input value $x_i$ and has to decide on an output value $y_i$ such that - the output values are in the convex hull of the non-faulty processors input values, - the output values are within distance $d$ of each other. Classically, the values are assumed to be from an $m$-dimensional Euclidean space, where $m ge 1$. In this work, we study the task in a discrete setting, where input values with some structure expressible as a graph. Namely, the input values are vertices of a finite graph $G$ and the goal is to output vertices that are within distance $d$ of each other in $G$, but still remain in the graph-induced convex hull of the input values. For $d=0$, the task reduces to consensus and cannot be solved with a deterministic algorithm in an asynchronous system even with a single crash fault. For any $d ge 1$, we show that the task is solvable in asynchronous systems when $G$ is chordal and $n > (omega+1)f$, where $omega$ is the clique number of~$G$. In addition, we give the first Byzantine-tolerant algorithm for a variant of lattice agreement. For synchronous systems, we show tight resilience bounds for the exact variants of these and related tasks over a large class of combinatorial structures.
In this paper we address Approximate Agreement problem in the Mobile Byzantine faults model. Our contribution is threefold. First, we propose the the first mapping from the existing variants of Mobile Byzantine models to the Mixed-Mode faults model.This mapping further help us to prove the correctness of class MSR (Mean-Subsequence-Reduce) Approximate Agreement algorithms in the Mobile Byzantine fault model, and is of independent interest. Secondly, we prove lower bounds for solving Approximate Agreement under all existing Mobile Byzantine faults models. Interestingly, these lower bounds are different from the static bounds. Finally, we propose matching upper bounds. Our paper is the first to link the Mobile Byzantine Faults models and the Mixed-Mode Faults models, and we advocate that a similar approach can be adopted in order to prove the correctness of other classical distributed building blocks (e.g. agreement, clock synchronization, interactive consistency etc) under Mobile Byzantine Faults model.
Concurrency has been a subject of study for more than 50 years. Still, many developers struggle to adapt their sequential code to be accessed concurrently. This need has pushed for generic solutions and specific concurrent data structures. Wait-free universal constructs are attractive as they can turn a sequential implementation of any object into an equivalent, yet concurrent and wait-free, implementation. While highly relevant from a research perspective, these techniques are of limited practical use when the underlying object or data structure is sizable. The copy operation can consume much of the CPUs resources and significantly degrade performance. To overcome this limitation, we have designed CX, a multi-instance-based wait-free universal construct that substantially reduces the amount of copy operations. The construct maintains a bounded number of instances of the object that can potentially be brought up to date. We applied CX to several sequential implementations of data structures, including STL implementations, and compared them with existing wait-free constructs. Our evaluation shows that CX performs significantly better in most experiments, and can even rival with hand-written lock-free and wait-free data structures, simultaneously providing wait-free progress, safe memory reclamation and high reader scalability.
Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and suffers from suboptimal performance in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom. We present a new wait-free memory reclamation scheme, Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline guarantees complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystallines high performance and high memory efficiency. Simultaneously ensuring all three properties require overcoming unique challenges which we discuss in the paper. Crystallines implementation relies on specialized instructions which are widely available on commodity hardware such as x86-64 or ARM64. Our experimental evaluations show that Crystalline exhibits outstanding scalability and memory efficiency, and achieves superior throughput than typical reclamation schemes such as EBR as the number of threads grows.
This paper presents the first implementation of a search tree data structure in an asynchronous shared-memory system that provides a wait-free algorithm for executing range queries on the tree, in addition to non-blocking algorithms for Insert, Delete and Find, using single-word Compare-and-Swap (CAS). The implementation is linearizable and tolerates any number of crash failures. Insert and Delete operations that operate on different parts of the tree run fully in parallel (without any interference with one another). We employ a lightweight helping mechanism, where each Insert, Delete and Find operation helps only update operations that affect the local neighbourhood of the leaf it arrives at. Similarly, a Scan helps only those updates taking place on nodes of the part of the tree it traverses, and therefore Scans operating on different parts of the tree do not interfere with one another. Our implementation works in a dynamic system where the number of processes may change over time. The implementation builds upon the non-blocking binary search tree implementation presented by Ellen et al. (in PODC 2010) by applying a simple mechanism to make the tree persistent.