A Two-level Spatial In-Memory Index

62 0 0.0 ( 0 )

Download Cite

Added by Panagiotis Bouros

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Dimitrios Tsitsigkos - Konstantinos Lampropoulos - Panagiotis Bouros

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Very large volumes of spatial data increasingly become available and demand effective management. While there has been decades of research on spatial data management, few works consider the current state of commodity hardware, having relatively large memory and the ability of parallel multi-core processing. In this work, we re-consider the design of spatial indexing under this new reality. Specifically, we propose a main-memory indexing approach for objects with spatial extent, which is based on a classic regular space partitioning into disjoint tiles. The novelty of our index is that the contents of each tile are further partitioned into four classes. This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem of spatial indexes based on disjoint space partitioning. The spatial partitions defined by our indexing scheme are totally independent, facilitating effortless parallel evaluation, as no synchronization or communication between the partitions is necessary. We show how our index can be used to efficiently process spatial range queries and drastically reduce the cost of the refinement step of the queries. In addition, we study the efficient processing of numerous range queries in batch and in parallel. Extensive experiments on real datasets confirm the efficiency of our approaches.

rate research

Cracking In-Memory Database Index A Case Study for Adaptive Radix Tree Index

57 - Gang Wu , Yidong Song , Guodong Zhao 2019

Indexes provide a method to access data in databases quickly. It can improve the response speed of subsequent queries by building a complete index in advance. However, it also leads to a huge overhead of the continuous updating during creating the index. An in-memory database usually has a higher query processing performance than disk databases and is more suitable for real-time query processing. Therefore, there is an urgent need to reduce the index creation and update cost for in-memory databases. Database cracking technology is currently recognized as an effective method to reduce the index initialization time. However, conventional cracking algorithms are focused on simple column data structure rather than those complex index structure for in-memory databases. In order to show the feasibility of in-memory database index cracking and promote to future more extensive research, this paper conducted a case study on the Adaptive Radix Tree (ART), a popular tree index structure of in-memory databases. On the basis of carefully examining the ART index construction overhead, an algorithm using auxiliary data structures to crack the ART index is proposed.

Databases

PI : a Parallel in-memory skip list based Index

146 - Zhongle Xie , Qingchao Cai , H.V. Jagadish 2016

Due to the coarse granularity of data accesses and the heavy use of latches, indices in the B-tree family are not efficient for in-memory databases, especially in the context of todays multi-core architecture. In this paper, we present PI, a Parallel in-memory skip list based Index that lends itself naturally to the parallel and concurrent environment, particularly with non-uniform memory access. In PI, incoming queries are collected, and disjointly distributed among multiple threads for processing to avoid the use of latches. For each query, PI traverses the index in a Breadth-First-Search (BFS) manner to find the list node with the matching key, exploiting SIMD processing to speed up the search process. In order for query processing to be latch-free, PI employs a light-weight communication protocol that enables threads to re-distribute the query workload among themselves such that each list node that will be modified as a result of query processing will be accessed by exactly one thread. We conducted extensive experiments, and the results show that PI can be up to three times as fast as the Masstree, a state-of-the-art B-tree based index.

Databases Distributed Parallel and Cluster Computing

Parallel In-Memory Evaluation of Spatial Joins

88 - Dimitrios Tsitsigkos , Panagiotis Bouros , Nikos Mamoulis andn Manolis Terrovitis 2019

The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. Our parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of rectangles.

Databases Distributed Parallel and Cluster Computing

Spatial Interpolation-based Learned Index for Range and kNN Queries

252 - Songnian Zhang , Suprio Ray , Rongxing Lu 2021

A corpus of recent work has revealed that the learned index can improve query performance while reducing the storage overhead. It potentially offers an opportunity to address the spatial query processing challenges caused by the surge in location-based services. Although several learned indexes have been proposed to process spatial data, the main idea behind these approaches is to utilize the existing one-dimensional learned models, which requires either converting the spatial data into one-dimensional data or applying the learned model on individual dimensions separately. As a result, these approaches cannot fully utilize or take advantage of the information regarding the spatial distribution of the original spatial data. To this end, in this paper, we exploit it by using the spatial (multi-dimensional) interpolation function as the learned model, which can be directly employed on the spatial data. Specifically, we design an efficient SPatial inteRpolation functIon based Grid index (SPRIG) to process the range and kNN queries. Detailed experiments are conducted on real-world datasets, and the results indicate that our proposed learned index can significantly improve the performance in comparison with the traditional spatial indexes and a state-of-the-art multi-dimensional learned index.

Databases

Two-level Data Staging ETL for Transaction Data

510 - Xiufeng Liu 2014

In data warehousing, Extract-Transform-Load (ETL) extracts the data from data sources into a central data warehouse regularly for the support of business decision-makings. The data from transaction processing systems are featured with the high frequent changes of insertion, update, and deletion. It is challenging for ETL to propagate the changes to the data warehouse, and maintain the change history. Moreover, ETL jobs typically run in a sequential order when processing the data with dependencies, which is not optimal, eg, when processing early-arriving data. In this paper, we propose a two-level data staging ETL for handling transaction data. The proposed method detects the changes of the data from transactional processing systems, identifies the corresponding operation codes for the changes, and uses two staging databases to facilitate the data processing in an ETL process. The proposed ETL provides the one-stop method for fast-changing, slowly-changing and early-arriving data processing.

Databases