White-Box Atomic Multicast (Extended Version)

112 0 0.0 ( 0 )

Download Cite

Added by Alexey Gotsman

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Alexey Gotsman - Anatole Lefort -

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeens multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeens protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.

rate research

Reconfigurable Atomic Transaction Commit (Extended Version)

141 - Manuel Bravo , Alexey Gotsman 2019

Modern data stores achieve scalability by partitioning data into shards and fault-tolerance by replicating each shard across several servers. A key component of such systems is a Transaction Certification Service (TCS), which atomically commits a transaction spanning multiple shards. Existing TCS protocols require 2f+1 crash-stop replicas per shard to tolerate f failures. In this paper we present atomic commit protocols that require only f+1 replicas and reconfigure the system upon failures using an external reconfiguration service. We furthermore rigorously prove that these protocols correctly implement a recently proposed TCS specification. We present protocols in two different models--the standard asynchronous message-passing model and a model with Remote Direct Memory Access (RDMA), which allows a machine to access the memory of another machine over the network without involving the latters CPU. Our protocols are inspired by a recent FARM system for RDMA-based transaction processing. Our work codifies the core ideas of FARM as distributed TCS protocols, rigorously proves them correct and highlights the trade-offs required by the use of RDMA.

Distributed Parallel and Cluster Computing

Privatization-Safe Transactional Memories (Extended Version)

71 - Artem Khyzha , Hagit Attiya , Alexey Gotsman 2019

Transactional memory (TM) facilitates the development of concurrent applications by letting the programmer designate certain code blocks as atomic. Programmers using a TM often would like to access the same data both inside and outside transactions, and would prefer their programs to have a strongly atomic semantics, which allows transactions to be viewed as executing atomically with respect to non-transactional accesses. Since guaranteeing such semantics for arbitrary programs is prohibitively expensive, researchers have suggested guaranteeing it only for certain data-race free (DRF) programs, particularly those that follow the privatization idiom: from some point on, threads agree that a given object can be accessed non-transactionally. In this paper we show that a variant of Transactional DRF (TDRF) by Dalessandro et al. is appropriate for a class of privatization-safe TMs, which allow using privatization idioms. We prove that, if such a TM satisfies a condition we call privatization-safe opacity and a program using the TM is TDRF under strongly atomic semantics, then the program indeed has such semantics. We also present a method for proving privatization-safe opacity that reduces proving this generalization to proving the usual opacity, and apply the method to a TM based on two-phase locking and a privatization-safe version of TL2. Finally, we establish the inherent cost of privatization-safety: we prove that a TM cannot be progressive and have invisible reads if it guarantees strongly atomic semantics for TDRF programs.

Distributed Parallel and Cluster Computing

Making Byzantine Consensus Live (Extended Version)

100 - Manuel Bravo , Gregory Chockler , Alexey Gotsman 2020

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-sh

Distributed Parallel and Cluster Computing

Efficient Replication via Timestamp Stability (Extended Version)

70 - Vitor Enes , Carlos Baquero , Alexey Gotsman 2021

Modern web applications replicate their data across the globe and require strong consistency guarantees for their most critical data. These guarantees are usually provided via state-machine replication (SMR). Recent advances in SMR have focused on leaderless protocols, which improve the availability and performance of traditional Paxos-based solutions. We propose Tempo - a leaderless SMR protocol that, in comparison to prior solutions, achieves superior throughput and offers predictable performance even in contended workloads. To achieve these benefits, Tempo timestamps each application command and executes it only after the timestamp becomes stable, i.e., all commands with a lower timestamp are known. Both the timestamping and stability detection mechanisms are fully decentralized, thus obviating the need for a leader replica. Our protocol furthermore generalizes to partial replication settings, enabling scalability in highly parallel workloads. We evaluate the protocol in both real and simulated geo-distributed environments and demonstrate that it outperforms state-of-the-art alternatives.

Distributed Parallel and Cluster Computing

D2.2 White-box methodologies, programming abstractions and libraries

254 - Phuong Hoai Ha , Vi Ngoc-Nha Tran , Ibrahim Umar 2018

This deliverable reports the results of white-box methodologies and early results of the first prototype of libraries and programming abstractions as available by project month 18 by Work Package 2 (WP2). It reports i) the latest results of Task 2.2 on white-box methodologies, programming abstractions and libraries for developing energy-efficient data structures and algorithms and ii) the improved results of Task 2.1 on investigating and modeling the trade-off between energy and performance of concurrent data structures and algorithms. The work has been conducted on two main EXCESS platforms: Intel platforms with recent Intel multicore CPUs and Movidius Myriad1 platform. Regarding white-box methodologies, we have devised new relaxed cache-oblivious models and proposed a new power model for Myriad1 platform and an energy model for lock-free queues on CPU platforms. For Myriad1 platform, the im- proved model now considers both computation and data movement cost as well as architecture and application properties. The model has been evaluated with a set of micro-benchmarks and application benchmarks. For Intel platforms, we have generalized the model for concurrent queues on CPU platforms to offer more flexibility according to the workers calling the data structure (parallel section sizes of enqueuers and dequeuers are decoupled). Regarding programming abstractions and libraries, we have continued investigat- ing the trade-offs between energy consumption and performance of data structures such as concurrent queues and concurrent search trees based on the early results of Task 2.1.The preliminary results show that our concurrent trees are faster and more energy efficient than the state-of-the-art on commodity HPC and embedded platforms.

Distributed Parallel and Cluster Computing