Compressed bitmap indexes: beyond unions and intersections

158 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Lemire

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Owen Kaser - Daniel Lemire

قواعد البيانات بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Compressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations (AND,OR,NOT) that are ideally suited for bitmaps. However, it is less obvious how to apply bitmaps to more advanced queries. For example, we might seek products in a store that meet some, but maybe not all, criteria. Such threshold queries generalize intersections and unions; they are often used in information-retrieval and data-mining applications. We introduce new algorithms that are sometimes three orders of magnitude faster than a naive approach. Our work shows that bitmap indexes are more broadly applicable than is commonly believed.

قيم البحث

294 - Daniel Lemire , Owen Kaser , Kamel Aouiche 2009

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate row-reordering heuristics. Simply permuting the columns of the table can increase the sorting efficiency by 40%. Secondary contributions include efficient algorithms to construct and aggregate bitmaps. The effect of word length is also reviewed by constructing 16-bit, 32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are slightly faster than 32-bit indexes despite being nearly twice as large.

قواعد البيانات

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

254 - Owen Kaser , Daniel Lemire , Kamel Aouiche 2009

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.

قواعد البيانات

Learning Multi-dimensional Indexes

210 - Vikram Nathan , Jialin Ding , Mohammad Alizadeh 2019

Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional inde xes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system.

قواعد البيانات بنى وهياكل البيانات والخوارزميات التعلم الآلي

Hierarchical Bitmap Indexing for Range and Membership Queries on Multidimensional Arrays

77 - Lubov{s} Krv{c}al n Czech Technical University in Prague 2021

Traditional indexing techniques commonly employed in da-ta-ba-se systems perform poorly on multidimensional array scientific data. Bitmap indices are widely used in commercial databases for processing complex queries, due to their effective use of bi t-wise operations and space-efficiency. However, bitmap indices apply natively to relational or linearized datasets, which is especially notable in binned or compressed indices. We propose a new method for multidimensional array indexing that overcomes the dimensionality-induced inefficiencies. The hierarchical indexing method is based on $n$-di-men-sional sparse trees for dimension partitioning, with bound number of individual, adaptively binned indices for attribute partitioning. This indexing performs well on range involving both dimensions and attributes, as it prunes the search space early, avoids reading entire index data, and does at most a single index traversal. Moreover, the indexing is easily extensible to membership queries. The indexing method was implemented on top of a state of the art bitmap indexing library Fastbit. We show that the hierarchical bitmap index outperforms conventional bitmap indexing built on auxiliary attribute for each dimension. Furthermore, the adaptive binning significantly reduces the amount of bins and therefore memory requirements.

قواعد البيانات بنى وهياكل البيانات والخوارزميات

Benchmarking Learned Indexes

72 - Ryan Marcus , Andreas Kipf , Alexander van Renen 2020

Recent advancements in learned index structures propose replacing existing index structures, like B-Trees, with approximate learned models. In this work, we present a unified benchmark that compares well-tuned implementations of three learned index s tructures against several state-of-the-art traditional baselines. Using four real-world datasets, we demonstrate that learned index structures can indeed outperform non-learned indexes in read-only in-memory workloads over a dense array. We also investigate the impact of caching, pipelining, dataset size, and key size. We study the performance profile of learned index structures, and build an explanation for why learned models achieve such good performance. Finally, we investigate other important properties of learned index structures, such as their performance in multi-threaded systems and their build times.

قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Compressed bitmap indexes: beyond unions and intersections

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً