To Vote Before Decide: A Logless One-Phase Commit Protocol for Highly-Available Datastores

59 0 0.0 ( 0 )

Download Cite

Added by Yuqing Zhu

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Yuqing Zhu - Philip S. Yu - Guolei Yi

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Highly-available datastores are widely deployed for online applications. However, many online applications are not contented with the simple data access interface currently provided by highly-available datastores. Distributed transaction support is demanded by applications such as large-scale online payment used by Alipay or Paypal. Current solutions to distributed transaction can spend more than half of the whole transaction processing time in distributed commit. An efficient atomic commit protocol is highly desirable. This paper presents the HACommit protocol, a logless one-phase commit protocol for highly-available systems. HACommit has transaction participants vote for a commit before the client decides to commit or abort the transaction; in comparison, the state-of-the-art practice for distributed commit is to have the client decide before participants vote. The change enables the removal of both the participant logging and the coordinator logging steps in the distributed commit process; it also makes possible that, after the client initiates the transaction commit, the transaction data is visible to other transactions within one communication roundtrip time (i.e., one phase). In the evaluation with extensive experiments, HACommit outperforms recent atomic commit solutions for highly-available datastores under different workloads. In the best case, HACommit can commit in one fifth of the time 2PC does.

rate research

Design and Analysis of a Logless Dynamic Reconfiguration Protocol

56 - William Schultz , Siyuan Zhou , Ian Dardik 2021

Distributed replication systems based on the replicated state machine model have become ubiquitous as the foundation of modern database systems. To ensure availability in the presence of faults, these systems must be able to dynamically replace failed nodes with healthy ones via dynamic reconfiguration. MongoDB is a document oriented database with a distributed replication mechanism derived from the Raft protocol. In this paper, we present MongoRaftReconfig, a novel dynamic reconfiguration protocol for the MongoDB replication system. MongoRaftReconfig utilizes a logless approach to managing configuration state and decouples the processing of configuration changes from the main database operation log. The protocols design was influenced by engineering constraints faced when attempting to redesign an unsafe, legacy reconfiguration mechanism that existed previously in MongoDB. We provide a safety proof of MongoRaftReconfig, along with a formal specification in TLA+. To our knowledge, this is the first published safety proof and formal specification of a reconfiguration protocol for a Raft-based system. We also present results from model checking its safety properties on finite protocol instances. Finally, we discuss the conceptual novelties of MongoRaftReconfig, how it can be understood as an optimized and generalized version of the single server reconfiguration algorithm of Raft, and present an experimental evaluation of how its optimizations can provide performance benefits for reconfigurations.

Distributed Parallel and Cluster Computing

Exploiting peer group concept for adaptive and highly available services

47 - Muhammad Asif Jan Fahd Ali Zahid , Mohammad Moazam Fraz (Foundation University , n Islamabad 2003

This paper presents a prototype for redundant, highly available and fault tolerant peer to peer framework for data management. Peer to peer computing is gaining importance due to its flexible organization, lack of central authority, distribution of functionality to participating nodes and ability to utilize unused computational resources. Emergence of GRID computing has provided much needed infrastructure and administrative domain for peer to peer computing. The components of this framework exploit peer group concept to scope service and information search, arrange services and information in a coherent manner, provide selective redundancy and ensure availability in face of failure and high load conditions. A prototype system has been implemented using JXTA peer to peer technology and XML is used for service description and interfaces, allowing peers to communicate with services implemented in various platforms including web services and JINI services. It utilizes code mobility to achieve role interchange among services and ensure dynamic group membership. Security is ensured by using Public Key Infrastructure (PKI) to implement group level security policies for membership and service access.

Distributed Parallel and Cluster Computing

Cornus: One-Phase Commit for Cloud Databases with Storage Disaggregation

168 - Zhihan Guo , Xinyu Zeng , Ziwei Ren 2021

Two-phase commit (2PC) has been widely used in distributed databases to ensure atomicity for distributed transactions. However, 2PC suffers from two limitations. First, 2PC incurs long latency as it requires two logging operations on the critical path. Second, when a coordinator fails, a participant may be blocked waiting for the coordinators decision, leading to indefinitely long latency and low throughput. We make a key observation that modern cloud databases feature a storage disaggregation architecture, which allows a transactions final decision to not rely on the central coordinator. We propose Cornus, a one-phase commit (1PC) protocol specifically designed for this architecture. Cornus can solve the two problems mentioned above by leveraging the fact that all compute nodes are able to access and modify the log data on any storage node. We present Cornus in detail, formally prove its correctness, develop certain optimization techniques, and evaluate against 2PC on YCSB and TPC-C workloads. The results show that Cornus can achieve 1.5x speedup in latency.

Databases

WiSer: A Highly Available HTAP DBMS for IoT Applications

277 - Ronald Barber , Christian Garcia-Arellano , Ronen Grosman 2019

In a classic transactional distributed database management system (DBMS), write transactions invariably synchronize with a coordinator before final commitment. While enforcing serializability, this model has long been criticized for not satisfying the applications availability requirements. When entering the era of Internet of Things (IoT), this problem has become more severe, as an increasing number of applications call for the capability of hybrid transactional and analytical processing (HTAP), where aggregation constraints need to be enforced as part of transactions. Current systems work around this by creating escrows, allowing occasional overshoots of constraints, which are handled via compensating application logic. The WiSer DBMS targets consistency with availability, by splitting the database commit into two steps. First, a PROMISE step that corresponds to what humans are used to as commitment, and runs without talking to a coordinator. Second, a SERIALIZE step, that fixes transactions positions in the serializable order, via a consensus procedure. We achieve this split via a novel data representation that embeds read-sets into transaction deltas, and serialization sequence numbers into table rows. WiSer does no sharding (all nodes can run transactions that modify the entire database), and yet enforces aggregation constraints. Both readwrite conflicts and aggregation constraint violations are resolved lazily in the serialized data. WiSer also covers node joins and departures as database tables, thus simplifying correctness and failure handling. We present the design of WiSer as well as experiments suggesting this approach has promise.

Databases

Reconfigurable Atomic Transaction Commit (Extended Version)

141 - Manuel Bravo , Alexey Gotsman 2019

Modern data stores achieve scalability by partitioning data into shards and fault-tolerance by replicating each shard across several servers. A key component of such systems is a Transaction Certification Service (TCS), which atomically commits a transaction spanning multiple shards. Existing TCS protocols require 2f+1 crash-stop replicas per shard to tolerate f failures. In this paper we present atomic commit protocols that require only f+1 replicas and reconfigure the system upon failures using an external reconfiguration service. We furthermore rigorously prove that these protocols correctly implement a recently proposed TCS specification. We present protocols in two different models--the standard asynchronous message-passing model and a model with Remote Direct Memory Access (RDMA), which allows a machine to access the memory of another machine over the network without involving the latters CPU. Our protocols are inspired by a recent FARM system for RDMA-based transaction processing. Our work codifies the core ideas of FARM as distributed TCS protocols, rigorously proves them correct and highlights the trade-offs required by the use of RDMA.

Distributed Parallel and Cluster Computing