UniStore: A fault-tolerant marriage of causal and strong consistency (extended version)

61 0 0.0 ( 0 )

Download Cite

Added by Manuel Bravo

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Manuel Bravo - Alexey Gotsman - Borja de Regil

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Modern online services rely on data stores that replicate their data across geographically distributed data centers. Providing strong consistency in such data stores results in high latencies and makes the system vulnerable to network partitions. The alternative of relaxing consistency violates crucial correctness properties. A compromise is to allow multiple consistency levels to coexist in the data store. In this paper we present UniStore, the first fault-tolerant and scalable data store that combines causal and strong consistency. The key challenge we address in UniStore is to maintain liveness despite data center failures: this could be compromised if a strong transaction takes a dependency on a causal transaction that is later lost because of a failure. UniStore ensures that such situations do not arise while paying the cost of durability for causal transactions only when necessary. We evaluate UniStore on Amazon EC2 using both microbenchmarks and a sample application. Our results show that UniStore effectively and scalably combines causal and strong consistency.

rate research

Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

215 - A. Katsarakis 2020

Todays datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.

Distributed Parallel and Cluster Computing

Fault Tolerant Frequent Pattern Mining

216 - Sameh Shohdy , Abhinav Vishnu , Gagan Agrawal 2016

FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing, though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.

Distributed Parallel and Cluster Computing

Building a fault tolerant application using the GASPI communication layer

367 - Faisal Shahzad , Moritz Kreutzer , Thomas Zeiser 2015

It is commonly agreed that highly parallel software on Exascale computers will suffer from many more runtime failures due to the decreasing trend in the mean time to failures (MTTF). Therefore, it is not surprising that a lot of research is going on in the area of fault tolerance and fault mitigation. Applications should survive a failure and/or be able to recover with minimal cost. MPI is not yet very mature in handling failures, the User-Level Failure Mitigation (ULFM) proposal being currently the most promising approach is still in its prototype phase. In our work we use GASPI, which is a relatively new communication library based on the PGAS model. It provides the missing features to allow the design of fault-tolerant applications. Instead of introducing algorithm-based fault tolerance in its true sense, we demonstrate how we can build on (existing) clever checkpointing and extend applications to allow integrate a low cost fault detection mechanism and, if necessary, recover the application on the fly. The aspects of process management, the restoration of groups and the recovery mechanism is presented in detail. We use a sparse matrix vector multiplication based application to perform the analysis of the overhead introduced by such modifications. Our fault detection mechanism causes no overhead in failure-free cases, whereas in case of failure(s), the failure detection and recovery cost is of reasonably acceptable order and shows good scalability.

Distributed Parallel and Cluster Computing

Revisiting Asynchronous Fault Tolerant Computation with Optimal Resilience

91 - Ittai Abraham , Danny Dolev , Gilad Stern 2020

The celebrated result of Fischer, Lynch and Paterson is the fundamental lower bound for asynchronous fault tolerant computation: any 1-crash resilient asynchronous agreement protocol must have some (possibly measure zero) probability of not terminating. In 1994, Ben-Or, Kelmer and Rabin published a proof-sketch of a lesser known lower bound for asynchronous fault tolerant computation with optimal resilience against a Byzantine adversary: if $nle 4t$ then any t-resilient asynchronous verifiable secret sharing protocol must have some non-zero probability of not terminating. Our main contribution is to revisit this lower bound and provide a rigorous and more general proof. Our second contribution is to show how to avoid this lower bound. We provide a protocol with optimal resilience that is almost surely terminating for a strong common coin functionality. Using this new primitive we provide an almost surely terminating protocol with optimal resilience for asynchronous Byzantine agreement that has a new fair validity property. To the best of our knowledge this is the first asynchronous Byzantine agreement with fair validity in the information theoretic setting.

Distributed Parallel and Cluster Computing

wChain: A Fast Fault-Tolerant Blockchain Protocol for Multihop Wireless Networks

169 - Minghui Xu , Chunchi Liu , Yifei Zou 2021

This paper presents $mathit{wChain}$, a blockchain protocol specifically designed for multihop wireless networks that deeply integrates wireless communication properties and blockchain technologies under the realistic SINR model. We adopt a hierarchical spanner as the communication backbone to address medium contention and achieve fast data aggregation within $O(log NlogGamma)$ slots where $N$ is the network size and $Gamma$ refers to the ratio of the maximum distance to the minimum distance between any two nodes. Besides, $mathit{wChain}$ employs data aggregation and reaggregation, and node recovery mechanisms to ensure efficiency, fault tolerance, persistence, and liveness. The worst-case runtime of $mathit{wChain}$ is upper bounded by $O(flog NlogGamma)$, where $f=lfloor frac{N}{2} rfloor$ is the upper bound of the number of faulty nodes. To validate our design, we conduct both theoretical analysis and simulation studies, and the results only demonstrate the nice properties of $mathit{wChain}$, but also point to a vast new space for the exploration of blockchain protocols in wireless networks.

Distributed Parallel and Cluster Computing