No Arabic abstract
Many distributed storage systems are transactional and a lot of work has been devoted to optimizing their performance, especially the performance of read-only transactions that are considered the most frequent in practice. Yet, the results obtained so far are rather disappointing, and some of the design decisions seem contrived. This paper contributes to explaining this state of affairs by proving intrinsic limitations of transactional storage systems, even those that need not ensure strong consistency but only causality. We first consider general storage systems where some transactions are read-only and some also involve write operations. We show that even read-only transactions cannot be fast: their operations cannot be executed within one round-trip message exchange between a client seeking an object and the server storing it. We then consider systems (as sometimes implemented today) where all transactions are read-only, i.e., updates are performed as individual operations outside transactions. In this case, read-only transactions can indeed be fast, but we prove that they need to be visible. They induce inherent updates on the servers, which in turn impact their overall performance.
State-of-the-art distributed in-memory datastores (FaRM, FaSST, DrTM) provide strongly-consistent distributed transactions with high performance and availability. Transactions in those systems are fully general; they can atomically manipulate any set of objects in the store, regardless of their location. To achieve this, these systems use complex distributed transactional protocols. Meanwhile, many workloads have a high degree of locality. For such workloads, distributed transactions are an overkill as most operations only access objects located on the same server -- if sharded appropriately. In this paper, we show that for these workloads, a single-node transactional protocol combined with dynamic object re-sharding and asynchronously pipelined replication can provide the same level of generality with better performance, simpler protocols, and lower developer effort. We present Zeus, an in-memory distributed datastore that provides general transactions by acquiring all objects involved in the transaction to the same server and executing a single-node transaction on them. Zeus is fault-tolerant and strongly-consistent. At the heart of Zeus is a reliable dynamic object sharding protocol that can move 250K objects per second per server, allowing Zeus to process millions of transactions per second and outperform more traditional distributed transactions on a wide range of workloads that exhibit locality.
Context: Concurrent objects with asynchronous messaging are an increasingly popular way to structure highly available, high performance, large-scale software systems. To ensure data-consistency and support synchronization between objects such systems often use distributed transactions with Two-Phase Locking (2PL) for concurrency control and Two-Phase commit (2PC) as atomic commitment protocol. Inquiry In highly available, high-throughput systems, such as large banking infrastructure, however, 2PL becomes a bottleneck when objects are highly contended, when an object is queuing a lot of messages because of locking. Approach: In this paper we introduce Path-Sensitive Atomic Commit (PSAC) to address this situation. We start from message handlers (or methods), which are decorated with pre- and post-conditions, describing their guards and effect. Knowledge: This allows the PSAC lock mechanism to check whether the effect of two incoming messages at the same time are independent, and to avoid locking if this is the case. As a result, more messages are directly accepted or rejected, and higher overall throughput is obtained. Grounding: We have implemented PSAC for a state machine-based DSL called Rebel, on top of a runtime based on the Akka actor framework. Our performance evaluation shows that PSAC exhibits the same scalability and latency characteristics as standard 2PL/2PC, and obtains up to 1.8 times median higher throughput in congested scenarios. Importance: We believe PSAC is a step towards enabling organizations to build scalable distributed applications, even if their consistency requirements are not embarrassingly parallel.
This paper presents the design and implementation of Obladi, the first system to provide ACID transactions while also hiding access patterns. Obladi uses as its building block oblivious RAM, but turns the demands of supporting transactions into a performance opportunity. By executing transactions within epochs and delaying commit decisions until an epoch ends, Obladi reduces the amortized bandwidth costs of oblivious storage and increases overall system throughput. These performance gains, combined with new oblivious mechanisms for concurrency control and recovery, allow Obladi to execute OLTP workloads with reasonable throughput: it comes within 5x to 12x of a non-oblivious baseline on the TPC-C, SmallBank, and FreeHealth applications. Latency overheads, however, are higher (70x on TPC-C).
While many researchers adopt a sharding approach to design scaling blockchains, few works have studied the transaction placement problem incurred by sharding protocols. The widely-used hashing placement algorithm renders an overwhelming portion of transactions as cross-shard. In this paper, we analyze the high cost of cross-shard transactions and reveal that most Bitcoin transactions have simple dependencies and can become single-shard under a placement algorithm taking transaction dependencies into account. In addition, we perform a case study of OptChain, which is the state-of-the-art transaction placement algorithm for sharded blockchains, and find a defect of it. A fix is proposed, and our evaluation results demonstrate that the fix helps OptChain improve the system throughput by 4x.
Graph transaction processing raises many unique challenges such as random data access due to the irregularity of graph structures, low throughput and high abort rate due to the relatively large read/write sets in graph transactions. To address these challenges, we present G-Tran -- an RDMA-enabled distributed in-memory graph database with serializable and snapshot isolation support. First, we propose a graph-native data store to achieve good data locality and fast data access for transactional updates and queries. Second, G-Tran adopts a fully decentralized architecture that leverages RDMA to process distributed transactions with the MPP model, which can achieve high performance by utilizing all computing resources. In addition, we propose a new MV-OCC implementation with two optimizations to address the issue of large read/write sets in graph transactions. Extensive experiments show that G-Tran achieves competitive performance compared with other popular graph databases on benchmark workloads.