PrismDB: Read-aware Log-structured Merge Trees for Heterogeneous Storage

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ashwini Raina

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ashwini Raina - Asaf Cidon - Kyle Jamieson

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In recent years, emerging hardware storage technologies have focused on divergent goals: better performance or lower cost-per-bit of storage. Correspondingly, data systems that employ these new technologies are optimized either to be fast (but expensive) or cheap (but slow). We take a different approach: by combining multiple tiers of fast and low-cost storage technologies within the same system, we can achieve a Pareto-efficient balance between performance and cost-per-bit. This paper presents the design and implementation of PrismDB, a novel log-structured merge tree based key-value store that exploits a full spectrum of heterogeneous storage technologies (from 3D XPoint to QLC NAND). We introduce the notion of read-awareness to log-structured merge trees, which allows hot objects to be pinned to faster storage, achieving better tiering and hot-cold separation of objects. Compared to the standard use of RocksDB on flash in datacenters today, PrismDBs average throughput on heterogeneous storage is 2.3$times$ faster and its tail latency is more than an order of magnitude better, using hardware than is half the cost.

قيم البحث

81 - Yifan Dai , Yien Xu , Aishwarya Ganesan 2020

We introduce BOURBON, a log-structured merge (LSM) tree that utilizes machine learning to provide fast lookups. We base the design and implementation of BOURBON on empirically-grounded principles that we derive through careful analysis of LSM design. BOURBON employs greedy piecewise linear regression to learn key distributions, enabling fast lookup with minimal computation, and applies a cost-benefit strategy to decide when learning will be worthwhile. Through a series of experiments on both synthetic and real-world datasets, we show that BOURBON improves lookup performance by 1.23x-1.78x as compared to state-of-the-art production LSMs.

قواعد البيانات التعلم الآلي

Efficiently Reclaiming Space in a Log Structured Store

167 - David Lomet 2020

A log structured store uses a single write I/O for a number of diverse and non-contiguous pages within a large buffer instead of using a write I/O for each page separately. This requires that pages be relocated on every write, because pages are never updated in place. Instead, pages are dynamically remapped on every write. Log structuring was invented for and used initially in file systems. Today, a form of log structuring is used in SSD controllers because an SSD requires the erasure of a large block of pages before flash storage can be reused. No update-in-place requires that the storage for out-of-date pages be reclaimed (garbage collected or cleaned). We analyze cleaning performance and introduce a cleaning strategy that uses a new way to prioritize the order in which stale pages are garbage collected. Our cleaning strategy approximates an optimal cleaning strategy. Simulation studies confirm the results of the analysis. This strategy is a significant improvement over previous cleaning strategies.

قواعد البيانات هندسة العتاد الأداء

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

77 - Gregory M. Essertel , Ruby Y. Tahboub , James M. Decker 2017

The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an i ncrease in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While Spark has made impressive progress, we show that for relational workloads, there is still a significant gap compared with best-of-breed query engines. And when stepping outside of the relational world, query optimization techniques are ineffective if large parts of a computation have to be treated as user-defined functions (UDFs). We present Flare: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark. We demonstrate order of magnitude speedups both for relational workloads such as TPC-H, as well as for a range of machine learning kernels that combine relational and iterative functional processing. Flare achieves these results through (1) compilation to native code, (2) replacing parts of the Spark runtime system, and (3) extending the scope of optimization and code generation to large classes of UDFs.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية الأداء

UStore: A Distributed Storage With Rich Semantics

400 - Anh Dinh , Ji Wang , Sheng Wang 2017

Todays storage systems expose abstractions which are either too low-level (e.g., key-value store, raw-block store) that they require developers to re-invent the wheels, or too high-level (e.g., relational databases, Git) that they lack generality to support many classes of applications. In this work, we propose and implement a general distributed data storage system, called UStore, which has rich semantics. UStore delivers three key properties, namely immutability, sharing and security, which unify and add values to many classes of todays applications, and which also open the door for new applications. By keeping the core properties within the storage, UStore helps reduce application development efforts while offering high performance at hand. The storage embraces current hardware trends as key enablers. It is built around a data-structure similar to that of Git, a popular source code versioning system, but it also synthesizes many designs from distributed systems and databases. Our current implementation of UStore has better performance than general in-memory key-value storage systems, especially for version scan operations. We port and evaluate four applications on top of UStore: a Git-like application, a collaborative data science application, a transaction management application, and a blockchain application. We demonstrate that UStore enables faster development and the UStore-backed applications can have better performance than the existing implementations.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية

Towards a Semantics-Aware Code Transformation Toolchain for Heterogeneous Systems

165 - Salvador Tamarit 2017

Obtaining good performance when programming heterogeneous computing platforms poses significant challenges. We present a program transformation environment, implemented in Haskell, where architecture-agnostic scientific C code with semantic annotatio ns is transformed into functionally equivalent code better suited for a given platform. The transformation steps are represented as rules that can be fired when certain syntactic and semantic conditions are fulfilled. These rules are not hard-wired into the rewriting engine: they are written in a C-like language and are automatically processed and incorporated into the rewriting engine. That makes it possible for end-users to add their own rules or to provide sets of rules that are adapted to certain specific domains or purposes.

لغات البرمجة النظم الموزعة والتوازية والحوسبة العنقودية هندسة البرمجيات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

PrismDB: Read-aware Log-structured Merge Trees for Heterogeneous Storage

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً