Learning Key-Value Store Design

136 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Stratos Idreos

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Stratos Idreos - Niv Dayan - Wilson Qin

قواعد البيانات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce the concept of design continuums for the data layout of key-value stores. A design continuum unifies major distinct data structure designs under the same model. The critical insight and potential long-term impact is that such unifying models 1) render what we consider up to now as fundamentally different data structures to be seen as views of the very same overall design space, and 2) allow seeing new data structure designs with performance properties that are not feasible by existing designs. The core intuition behind the construction of design continuums is that all data structures arise from the very same set of fundamental design principles, i.e., a small set of data layout design concepts out of which we can synthesize any design that exists in the literature as well as new ones. We show how to construct, evaluate, and expand, design continuums and we also present the first continuum that unifies major data structure designs, i.e., B+tree, B-epsilon-tree, LSM-tree, and LSH-table. The practical benefit of a design continuum is that it creates a fast inference engine for the design of data structures. For example, we can predict near instantly how a specific design change in the underlying storage of a data system would affect performance, or reversely what would be the optimal data structure (from a given set of designs) given workload characteristics and a memory budget. In turn, these properties allow us to envision a new class of self-designing key-value stores with a substantially improved ability to adapt to workload and hardware changes by transitioning between drastically different data structure designs to assume a diverse set of performance properties at will.

قيم البحث

314 - Haoyu Huang , Shahram Ghandeharizadeh 2021

The cloud infrastructure motivates disaggregation of monolithic data stores into components that are assembled together based on an applications workload. This study investigates disaggregation of an LSM-tree key-value store into components that comm unicate using RDMA. These components separate storage from processing, enabling processing components to share storage bandwidth and space. The processing components scatter blocks of a file (SSTable) across an arbitrary number of storage components and balance load across them using power-of-d. They construct ranges dynamically at runtime to parallelize compaction and enhance performance. Each component has configuration knobs that control its scalability. The resulting component-based system, Nova-LSM, is elastic. It outperforms its monolithic counterparts, both LevelDB and RocksDB, by several orders of magnitude with workloads that exhibit a skewed pattern of access to data.

قواعد البيانات

Multi-version Indexing in Flash-based Key-Value Stores

256 - Pulkit A. Misra , Jeffrey S. Chase , Johannes Gehrke 2019

Maintaining multip

قواعد البيانات أنظمة التشغيل

Tensor Relational Algebra for Machine Learning System Design

191 - Binhang Yuan , Dimitrije Jankov , Jia Zou 2020

We consider the question: what is the abstraction that should be implemented by the computational engine of a machine learning system? Current machine learning systems typically push whole tensors through a series of compute kernels such as matrix mu ltiplications or activation functions, where each kernel runs on an AI accelerator (ASIC) such as a GPU. This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC. In this paper, we present an alternative implementation abstraction called the tensor relational algebra (TRA). The TRA is a set-based algebra based on the relational algebra. Expressions in the TRA operate over binary tensor relations, where keys are multi-dimensional arrays and values are tensors. The TRA is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML workflows in distributed clusters.

قواعد البيانات التعلم الآلي

Efficiently Reclaiming Space in a Log Structured Store

167 - David Lomet 2020

A log structured store uses a single write I/O for a number of diverse and non-contiguous pages within a large buffer instead of using a write I/O for each page separately. This requires that pages be relocated on every write, because pages are never updated in place. Instead, pages are dynamically remapped on every write. Log structuring was invented for and used initially in file systems. Today, a form of log structuring is used in SSD controllers because an SSD requires the erasure of a large block of pages before flash storage can be reused. No update-in-place requires that the storage for out-of-date pages be reclaimed (garbage collected or cleaned). We analyze cleaning performance and introduce a cleaning strategy that uses a new way to prioritize the order in which stale pages are garbage collected. Our cleaning strategy approximates an optimal cleaning strategy. Simulation studies confirm the results of the analysis. This strategy is a significant improvement over previous cleaning strategies.

قواعد البيانات هندسة العتاد الأداء

Towards Value-Sensitive Learning Analytics Design

254 - Bodong Chen , Haiyi Zhu 2018

To support ethical considerations and system integrity in learning analytics, this paper introduces two cases of applying the Value Sensitive Design methodology to learning analytics design. The first study applied two methods of Value Sensitive Desi gn, namely stakeholder analysis and value analysis, to a conceptual investigation of an existing learning analytics tool. This investigation uncovered a number of values and value tensions, leading to design trade-offs to be considered in future tool refinements. The second study holistically applied Value Sensitive Design to the design of a recommendation system for the Wikipedia WikiProjects. To proactively consider values among stakeholders, we derived a multi-stage design process that included literature analysis, empirical investigations, prototype development, community engagement, iterative testing and refinement, and continuous evaluation. By reporting on these two cases, this paper responds to a need of practical means to support ethical considerations and human values in learning analytics systems. These two cases demonstrate that Value Sensitive Design could be a viable approach for balancing a wide range of human values, which tend to encompass and surpass ethical issues, in learning analytics design.

تفاعل الإنسان والحاسوب