Crafty: Efficient, HTM-Compatible Persistent Transactions

104 0 0.0 ( 0 )

Download Cite

Added by Kaan Gen\\c{c}

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Kaan Genc{c} Ohion State University

Programming Languages

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Byte-addressable persistent memory, such as Intel/Micron 3D XPoint, is an emerging technology that bridges the gap between volatile memory and persistent storage. Data in persistent memory survives crashes and restarts; however, it is challenging to ensure that this data is consistent after failures. Existing approaches incur significant performance costs to ensure crash consistency. This paper introduces Crafty, a new approach for ensuring consistency and atomicity on persistent memory operations using commodity hardware with existing hardware transactional memory (HTM) capabilities, while incurring low overhead. Crafty employs a novel technique called nondestructive undo logging that leverages commodity HTM to control persist ordering. Our evaluation shows that Crafty outperforms state-of-the-art prior work under low contention, and performs competitively under high contention.

rate research

Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions

138 - Arash Tavakkol , Aasheesh Kolli , Stanko Novakovic 2018

Synchronous Mirroring (SM) is a standard approach to building highly-available and fault-tolerant enterprise storage systems. SM ensures strong data consistency by maintaining multiple exact data replicas and synchronously propagating every update to all of them. Such strong consistency provides fault tolerance guarantees and a simple programming model coveted by enterprise system designers. For current storage devices, SM comes at modest performance overheads. This is because performing both local and remote updates simultaneously is only marginally slower than performing just local updates, due to the relatively slow performance of accesses to storage in todays systems. However, emerging persistent memory and ultra-low-latency network technologies necessitate a careful re-evaluation of the existing SM techniques, as these technologies present fundamentally different latency characteristics compared than their traditional counterparts. In addition to that, existing low-latency network technologies, such as Remote Direct Memory Access (RDMA), provide limited ordering guarantees and do not provide durability guarantees necessary for SM. To evaluate the performance implications of RDMA-based SM, we develop a rigorous testing framework that is based on emulated persistent memory. Our testing framework makes use of two different tools: (i) a configurable microbenchmark and (ii) a modified version of the WHISPER benchmark suite, which comprises a set of common cloud applications. Using this framework, we find that recently proposed RDMA primitives, such as remote commit, provide correctness guarantees, but do not take full advantage of the asynchronous nature of RDMA hardware. To this end, we propose new primitives enabling efficient and correct SM over RDMA, and use these primitives to develop two new techniques delivering high-performance SM of persistent memories.

Distributed Parallel and Cluster Computing

Skeena: Efficient and Consistent Cross-Engine Transactions

88 - Jianqiu Zhang , Kaisong Huang , Tianzheng Wang 2021

With the growing DRAM capacity and core count in modern servers, database systems are becoming increasingly multi-engine to feature a heterogeneous set of engines. In particular, a memory-optimized engine and a conventional storage-centric engine may coexist for various application needs. However, handling cross-engine transactions that access more than one engine remains challenging in terms of correctness, performance and programmability. This paper describes Skeena, a holistic approach to cross-engine transactions. We propose a lightweight transaction tracking structure and an atomic commit protocol to ensure correctness and support various isolation levels in multi-engine systems. Evaluation on a 40-core server shows that Skeena (1) does not penalize single-engine transactions and (2) enables the use of cross-engine transactions to improve throughput by up to 30x and/or reduce storage cost by judiciously placing tables in different engines.

Databases

Efficient Local Unfolding with Ancestor Stacks

327 - G. Puebla , E. Albert , M. Hermenegildo 2009

The most successful unfolding rules used nowadays in the partial evaluation of logic programs are based on well quasi orders (wqo) applied over (covering) ancestors, i.e., a subsequence of the atoms selected during a derivation. Ancestor (sub)sequences are used to increase the specialization power of unfolding while still guaranteeing termination and also to reduce the number of atoms for which the wqo has to be checked. Unfortunately, maintaining the structure of the ancestor relation during unfolding introduces significant overhead. We propose an efficient, practical local unfolding rule based on the notion of covering ancestors which can be used in combination with a wqo and allows a stack-based implementation without losing any opportunities for specialization. Using our technique, certain non-leftmost unfoldings are allowed as long as local unfolding is performed, i.e., we cover depth-first strategies.

Programming Languages Performance

Efficient Path-Sensitive Data-Dependence Analysis

174 - Peisen Yao , Jinguo Zhou , Xiao Xiao 2021

This paper presents a scalable path- and context-sensitive data-dependence analysis. The key is to address the aliasing-path-explosion problem via a sparse, demand-driven, and fused approach that piggybacks the computation of pointer information with the resolution of data dependence. Specifically, our approach decomposes the computational efforts of disjunctive reasoning into 1) a context- and semi-path-sensitive analysis that concisely summarizes data dependence as the symbolic and storeless value-flow graphs, and 2) a demand-driven phase that resolves transitive data dependence over the graphs. We have applied the approach to two clients, namely thin slicing and value flow analysis. Using a suite of 16 programs ranging from 13 KLoC to 8 MLoC, we compare our techniques against a diverse group of state-of-the-art analyses, illustrating significant precision and scalability advantages of our approach.

Programming Languages

Toward Efficient Interactions between Python and Native Libraries

232 - Jialiang Tan , Yu Chen , Zhenming Liu 2021

Python has become a popular programming language because of its excellent programmability. Many modern software packages utilize Python for high-level algorithm design and depend on native libraries written in C/C++/Fortran for efficient computation kernels. Interaction between Python code and native libraries introduces performance losses because of the abstraction lying on the boundary of Python and native libraries. On the one side, Python code, typically run with interpretation, is disjoint from its execution behavior. On the other side, native libraries do not include program semantics to understand algorithm defects. To understand the interaction inefficiencies, we extensively study a large collection of Python software packages and categorize them according to the root causes of inefficiencies. We extract two inefficiency patterns that are common in interaction inefficiencies. Based on these patterns, we develop PieProf, a lightweight profiler, to pinpoint interaction inefficiencies in Python applications. The principle of PieProf is to measure the inefficiencies in the native execution and associate inefficiencies with high-level Python code to provide a holistic view. Guided by PieProf, we optimize 17 real-world applications, yielding speedups up to 6.3$times$ on application level.

Programming Languages Performance