ترغب بنشر مسار تعليمي؟ اضغط هنا

Hierarchical Bitmap Indexing for Range and Membership Queries on Multidimensional Arrays

78   0   0.0 ( 0 )
 نشر من قبل Lubo\\v{s} Kr\\v{c}\\'al
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Traditional indexing techniques commonly employed in da-ta-ba-se systems perform poorly on multidimensional array scientific data. Bitmap indices are widely used in commercial databases for processing complex queries, due to their effective use of bit-wise operations and space-efficiency. However, bitmap indices apply natively to relational or linearized datasets, which is especially notable in binned or compressed indices. We propose a new method for multidimensional array indexing that overcomes the dimensionality-induced inefficiencies. The hierarchical indexing method is based on $n$-di-men-sional sparse trees for dimension partitioning, with bound number of individual, adaptively binned indices for attribute partitioning. This indexing performs well on range involving both dimensions and attributes, as it prunes the search space early, avoids reading entire index data, and does at most a single index traversal. Moreover, the indexing is easily extensible to membership queries. The indexing method was implemented on top of a state of the art bitmap indexing library Fastbit. We show that the hierarchical bitmap index outperforms conventional bitmap indexing built on auxiliary attribute for each dimension. Furthermore, the adaptive binning significantly reduces the amount of bins and therefore memory requirements.

قيم البحث

اقرأ أيضاً

Structural indexing is an approach to accelerating query evaluation, whereby data objects are partitioned and indexed reflecting the precise expressive power of a given query language. Each partition block of the index holds exactly those objects tha t are indistinguishable with respect to queries expressible in the language. Structural indexes have proven successful for XML, RDF, and relational data management. In this paper we study structural indexing for conjunctive path queries (CPQ). CPQ forms the core of contemporary graph query languages such as SPARQL, Cypher, PGQL, and G-CORE. CPQ plays the same fundamental role with respect to contemporary graph query languages as the classic conjunctive queries play for SQL. We develop the first practical structural indexes for this important query language. In particular, we propose a structural index based on k-path-bisimulation, tightly coupled to the expressive power of CPQ, and develop algorithms for efficient query processing with our index. Furthermore, we study workload-aware structural indexes to reduce both the construction and space costs according to a given workload. We demonstrate through extensive experiments using real and synthetic graphs that our methods accelerate query processing by up to multiple orders of magnitude over the state-of-the-art methods, without increasing index size.
Efficient large-scale annotation of genomic intervals is essential for personal genome interpretation in the realm of precision medicine. There are 13 possible relations between two intervals according to Allens interval algebra. Conventional interva l trees are routinely used to identify the genomic intervals satisfying a coarse relation with a query interval, but cannot support efficient query for more refined relations such as all Allens relations. We design and implement a novel approach to address this unmet need. Through rewriting Allens interval relations, we transform an interval query to a range query, then adapt and utilize the range trees for querying. We implement two types of range trees: a basic 2-dimensional range tree (2D-RT) and an augmented range tree with fractional cascading (RTFC) and compare them with the conventional interval tree (IT). Theoretical analysis shows that RTFC can achieve the best time complexity for interval queries regarding all Allens relations among the three trees. We also perform comparative experiments on the efficiency of RTFC, 2D-RT and IT in querying noncoding element annotations in a large collection of personal genomes. Our experimental results show that 2D-RT is more efficient than IT for interval queries regarding most of Allens relations, RTFC is even more efficient than 2D-RT. The results demonstrate that RTFC is an efficient data structure for querying large-scale datasets regarding Allens relations between genomic intervals, such as those required by interpreting genome-wide variation in large populations.
104 - Owen Kaser , Daniel Lemire 2014
Compressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations (AND,OR,NOT) that are ideally suited for bitmaps. However , it is less obvious how to apply bitmaps to more advanced queries. For example, we might seek products in a store that meet some, but maybe not all, criteria. Such threshold queries generalize intersections and unions; they are often used in information-retrieval and data-mining applications. We introduce new algorithms that are sometimes three orders of magnitude faster than a naive approach. Our work shows that bitmap indexes are more broadly applicable than is commonly believed.
The increasing availability of structured datasets, from Web tables and open-data portals to enterprise data, opens up opportunities~to enrich analytics and improve machine learning models through relational data augmentation. In this paper, we intro duce a new class of data augmentation queries: join-correlation queries. Given a column $Q$ and a join column $K_Q$ from a query table $mathcal{T}_Q$, retrieve tables $mathcal{T}_X$ in a dataset collection such that $mathcal{T}_X$ is joinable with $mathcal{T}_Q$ on $K_Q$ and there is a column $C in mathcal{T}_X$ such that $Q$ is correlated with $C$. A naive approach to evaluate these queries, which first finds joinable tables and then explicitly joins and computes correlations between $Q$ and all columns of the discovered tables, is prohibitively expensive. To efficiently support correlated column discovery, we 1) propose a sketching method that enables the construction of an index for a large number of tables and that provides accurate estimates for join-correlation queries, and 2) explore different scoring strategies that effectively rank the query results based on how well the columns are correlated with the query. We carry out a detailed experimental evaluation, using both synthetic and real data, which shows that our sketches attain high accuracy and the scoring strategies lead to high-quality rankings.
A corpus of recent work has revealed that the learned index can improve query performance while reducing the storage overhead. It potentially offers an opportunity to address the spatial query processing challenges caused by the surge in location-bas ed services. Although several learned indexes have been proposed to process spatial data, the main idea behind these approaches is to utilize the existing one-dimensional learned models, which requires either converting the spatial data into one-dimensional data or applying the learned model on individual dimensions separately. As a result, these approaches cannot fully utilize or take advantage of the information regarding the spatial distribution of the original spatial data. To this end, in this paper, we exploit it by using the spatial (multi-dimensional) interpolation function as the learned model, which can be directly employed on the spatial data. Specifically, we design an efficient SPatial inteRpolation functIon based Grid index (SPRIG) to process the range and kNN queries. Detailed experiments are conducted on real-world datasets, and the results indicate that our proposed learned index can significantly improve the performance in comparison with the traditional spatial indexes and a state-of-the-art multi-dimensional learned index.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا