Skeena: Efficient and Consistent Cross-Engine Transactions

89 0 0.0 ( 0 )

Download Cite

Added by Jianqiu Zhang

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Jianqiu Zhang - Kaisong Huang - Tianzheng Wang

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

With the growing DRAM capacity and core count in modern servers, database systems are becoming increasingly multi-engine to feature a heterogeneous set of engines. In particular, a memory-optimized engine and a conventional storage-centric engine may coexist for various application needs. However, handling cross-engine transactions that access more than one engine remains challenging in terms of correctness, performance and programmability. This paper describes Skeena, a holistic approach to cross-engine transactions. We propose a lightweight transaction tracking structure and an atomic commit protocol to ensure correctness and support various isolation levels in multi-engine systems. Evaluation on a 40-core server shows that Skeena (1) does not penalize single-engine transactions and (2) enables the use of cross-engine transactions to improve throughput by up to 30x and/or reduce storage cost by judiciously placing tables in different engines.

rate research

ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications

137 - Sheng Wang , Tien Tuan Anh Dinh , Qian Lin 2018

Existing data storage systems offer a wide range of functionalities to accommodate an equally diverse range of applications. However, new classes of applications have emerged, e.g., blockchain and collaborative analytics, featuring data versioning, fork semantics, tamper-evidence or any combination thereof. They present new opportunities for storage systems to efficiently support such applications by embedding the above requirements into the storage. In this paper, we present ForkBase, a storage engine specifically designed to provide efficient support for blockchain and forkable applications. By integrating the core application properties into the storage, ForkBase not only delivers high performance but also reduces development effort. Data in ForkBase is multi-versioned, and each version uniquely identifies the data content and its history. Two variants of fork semantics are supported in ForkBase to facilitate any collaboration workflows. A novel index structure is introduced to efficiently identify and eliminate duplicate content across data objects. Consequently, ForkBase is not only efficient in performance, but also in space requirement. We demonstrate the performance of ForkBase using three applications: a blockchain platform, a wiki engine and a collaborative analytics application. We conduct extensive experimental evaluation of these applications against respective state-of-the-art system. The results show that ForkBase achieves superior performance while significantly lowering the development cost.

Databases Cryptography and Security Distributed Parallel and Cluster Computing

Efficient Iterative Processing in the SciDB Parallel Array Engine

222 - Emad Soroush , Magdalena Balazinska , Simon Krughoff 2015

Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain.

Databases

Debugging Transactions and Tracking their Provenance with Reenactment

117 - Xing Niu , Bahareh Sadat Arab , Seokki Lee 2017

Debugging transactions and understanding their execution are of immense importance for developing OLAP applications, to trace causes of errors in production systems, and to audit the operations of a database. However, debugging transactions is hard for several reasons: 1) after the execution of a transaction, its input is no longer available for debugging, 2) internal states of a transaction are typically not accessible, and 3) the execution of a transaction may be affected by concurrently running transactions. We present a debugger for transactions that enables non-invasive, post-mortem debugging of transactions with provenance tracking and supports what-if scenarios (changes to transaction code or data). Using reenactment, a declarative replay technique we have developed, a transaction is replayed over the state of the DB seen by its original execution including all its interactions with concurrently executed transactions from the history. Importantly, our approach uses the temporal database and audit logging capabilities available in many DBMS and does not require any modifications to the underlying database system nor transactional workload.

Databases

GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra -- Full Version

219 - Yuedan Chen , M. Tamer Ozsu , Guoqing Xiao 2021

Efficient execution of SPARQL queries over large RDF datasets is a topic of considerable interest due to increased use of RDF to encode data. Most of this work has followed either relational or graph-based approaches. In this paper, we propose an alternative query engine, called gSmart, based on matrix algebra. This approach can potentially better exploit the computing power of high-performance heterogeneous architectures that we target. gSmart incorporates: (1) grouped incident edge-based SPARQL query evaluation, in which all unevaluated edges of a vertex are evaluated together using a series of matrix operations to fully utilize query constraints and narrow down the solution space; (2) a graph query planner that determines the order in which vertices in query graphs should be evaluated; (3) memory- and computation-efficient data structures including the light-weight sparse matrix (LSpM) storage for RDF data and the tree-based representation for evaluation results; (4) a multi-stage data partitioner to map the incident edge-based query evaluation into heterogeneous HPC architectures and develop multi-level parallelism; and (5) a parallel executor that uses the fine-grained processing scheme, pre-pruning technique, and tree-pruning technique to lower inter-node communication and enable high throughput. Evaluations of gSmart on a CPU+GPU HPC architecture show execution time speedups of up to 46920.00x compared to the existing SPARQL query engines on a single node machine. Additionally, gSmart on the Tianhe-1A supercomputer achieves a maximum speedup of 6.90x scaling from 2 to 16 CPU+GPU nodes.

Databases Distributed Parallel and Cluster Computing

Scheduling OLTP Transactions via Machine Learning

106 - Yangjun Sheng , Anthony Tomasic , Tieying Zhang 2019

Current main memory database system architectures are still challenged by high contention workloads and this challenge will continue to grow as the number of cores in processors continues to increase. These systems schedule transactions randomly across cores to maximize concurrency and to produce a uniform load across cores. Scheduling never considers potential conflicts. Performance could be improved if scheduling balanced between concurrency to maximize throughput and scheduling transactions linearly to avoid conflicts. In this paper, we present the design of several intelligent transaction scheduling algorithms that consider both potential transaction conflicts and concurrency. To incorporate reasoning about transaction conflicts, we develop a supervised machine learning model that estimates the probability of conflict. This model is incorporated into several scheduling algorithms. In addition, we integrate an unsupervised machine learning algorithm into an intelligent scheduling algorithm. We then empirically measure the performance impact of different scheduling algorithms on OLTP and social networking workloads. Our results show that, with appropriate settings, intelligent scheduling can increase throughput by 54% and reduce abort rate by 80% on a 20-core machine, relative to random scheduling. In summary, the paper provides preliminary evidence that intelligent scheduling significantly improves DBMS performance.

Databases