A Wait-Free Universal Construct for Large Objects

324 0 0.0 ( 0 )

Download Cite

Added by Pedro Ramalhete

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Andreia Correia - Pedro Ramalhete - Pascal Felber

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Concurrency has been a subject of study for more than 50 years. Still, many developers struggle to adapt their sequential code to be accessed concurrently. This need has pushed for generic solutions and specific concurrent data structures. Wait-free universal constructs are attractive as they can turn a sequential implementation of any object into an equivalent, yet concurrent and wait-free, implementation. While highly relevant from a research perspective, these techniques are of limited practical use when the underlying object or data structure is sizable. The copy operation can consume much of the CPUs resources and significantly degrade performance. To overcome this limitation, we have designed CX, a multi-instance-based wait-free universal construct that substantially reduces the amount of copy operations. The construct maintains a bounded number of instances of the object that can potentially be brought up to date. We applied CX to several sequential implementations of data structures, including STL implementations, and compared them with existing wait-free constructs. Our evaluation shows that CX performs significantly better in most experiments, and can even rival with hand-written lock-free and wait-free data structures, simultaneously providing wait-free progress, safe memory reclamation and high reader scalability.

rate research

An Efficient Universal Construction for Large Objects

58 - Panagiota Fatourou , Nikolaos D. Kallimanis , Eleni Kanellou 2020

This paper presents L-UC, a universal construction that efficiently implements dynamic objects of large state in a wait-free manner. The step complexity of L-UC is O(n+kw), where n is the number of processes, k is the interval contention (i.e., the maximum number of active processes during the execution interval of an operation), and w is the worst-case time complexity to perform an operation on the sequential implementation of the simulated object. L-UC efficiently implements objects whose size can change dynamically. It improves upon previous universal constructions either by efficiently handling objects whose state is large and can change dynamically, or by achieving better step complexity.

Distributed Parallel and Cluster Computing

Wait-free approximate agreement on graphs

101 - Dan Alistarh , Faith Ellen , Joel Rybicki 2021

Approximate agreement is one of the few variants of consensus that can be solved in a wait-free manner in asynchronous systems where processes communicate by reading and writing to shared memory. In this work, we consider a natural generalisation of approximate agreement on arbitrary undirected connected graphs. Each process is given a vertex of the graph as input and, if non-faulty, must output a vertex such that - all the outputs are within distance 1 of one another, and - each output value lies on a shortest path between two input values. From prior work, it is known that there is no wait-free algorithm among $n ge 3$ processes for this problem on any cycle of length $c ge 4$, by reduction from 2-set agreement (Casta~neda et al., 2018). In this work, we investigate the solvability and complexity of this task on general graphs. We give a new, direct proof of the impossibility of approximate agreement on cycles of length $c ge 4$, via a generalisation of Sperners Lemma to convex polygons. We also extend the reduction from 2-set agreement to a larger class of graphs, showing that approximate agreement on on these graphs is unsolvable. Furthermore, we show that combinatorial arguments, used by both existing proofs, are necessary, by showing that the impossibility of a wait-free algorithm in the nonuniform iterated snapshot model cannot be proved via an extension-based proof. On the positive side, we present a wait-free algorithm for a class of graphs that properly contains the class of chordal graphs.

Distributed Parallel and Cluster Computing

Crystalline: Fast and Memory Efficient Wait-Free Reclamation

131 - Ruslan Nikolaev , Binoy Ravindran 2021

Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and suffers from suboptimal performance in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom. We present a new wait-free memory reclamation scheme, Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline guarantees complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystallines high performance and high memory efficiency. Simultaneously ensuring all three properties require overcoming unique challenges which we discuss in the paper. Crystallines implementation relies on specialized instructions which are widely available on commodity hardware such as x86-64 or ARM64. Our experimental evaluations show that Crystalline exhibits outstanding scalability and memory efficiency, and achieves superior throughput than typical reclamation schemes such as EBR as the number of threads grows.

Distributed Parallel and Cluster Computing

Persistent Non-Blocking Binary Search Trees Supporting Wait-Free Range Queries

92 - Panagiota Fatourou , Eric Ruppert 2018

This paper presents the first implementation of a search tree data structure in an asynchronous shared-memory system that provides a wait-free algorithm for executing range queries on the tree, in addition to non-blocking algorithms for Insert, Delete and Find, using single-word Compare-and-Swap (CAS). The implementation is linearizable and tolerates any number of crash failures. Insert and Delete operations that operate on different parts of the tree run fully in parallel (without any interference with one another). We employ a lightweight helping mechanism, where each Insert, Delete and Find operation helps only update operations that affect the local neighbourhood of the leaf it arrives at. Similarly, a Scan helps only those updates taking place on nodes of the part of the tree it traverses, and therefore Scans operating on different parts of the tree do not interfere with one another. Our implementation works in a dynamic system where the number of processes may change over time. The implementation builds upon the non-blocking binary search tree implementation presented by Ellen et al. (in PODC 2010) by applying a simple mechanism to make the tree persistent.

Distributed Parallel and Cluster Computing

Universal Loop-Free Super-Stabilization

494 - Lelia Blin 2010

We propose an univesal scheme to design loop-free and super-stabilizing protocols for constructing spanning trees optimizing any tree metrics (not only those that are isomorphic to a shortest path tree). Our scheme combines a novel super-stabilizing loop-free BFS with an existing self-stabilizing spanning tree that optimizes a given metric. The composition result preserves the best properties of both worlds: super-stabilization, loop-freedom, and optimization of the original metric without any stabilization time penalty. As case study we apply our composition mechanism to two well known metric-dependent spanning trees: the maximum-flow tree and the minimum degree spanning tree.

Distributed Parallel and Cluster Computing Data Structures and Algorithms Networking and Internet Architecture