Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

PI : a Parallel in-memory skip list based Index

147 0 0.0 ( 0 )

Download Cite

Added by Zhongle Xie

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Zhongle Xie - Qingchao Cai - H.V. Jagadish

Databases Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Due to the coarse granularity of data accesses and the heavy use of latches, indices in the B-tree family are not efficient for in-memory databases, especially in the context of todays multi-core architecture. In this paper, we present PI, a Parallel in-memory skip list based Index that lends itself naturally to the parallel and concurrent environment, particularly with non-uniform memory access. In PI, incoming queries are collected, and disjointly distributed among multiple threads for processing to avoid the use of latches. For each query, PI traverses the index in a Breadth-First-Search (BFS) manner to find the list node with the matching key, exploiting SIMD processing to speed up the search process. In order for query processing to be latch-free, PI employs a light-weight communication protocol that enables threads to re-distribute the query workload among themselves such that each list node that will be modified as a result of query processing will be accessed by exactly one thread. We conducted extensive experiments, and the results show that PI can be up to three times as fast as the Masstree, a state-of-the-art B-tree based index.

rate research

Parallel In-Memory Evaluation of Spatial Joins

88 - Dimitrios Tsitsigkos , Panagiotis Bouros , Nikos Mamoulis andn Manolis Terrovitis 2019

The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. Our parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of rectangles.

Databases Distributed Parallel and Cluster Computing

A Two-level Spatial In-Memory Index

61 - Dimitrios Tsitsigkos , Konstantinos Lampropoulos , Panagiotis Bouros 2020

Very large volumes of spatial data increasingly become available and demand effective management. While there has been decades of research on spatial data management, few works consider the current state of commodity hardware, having relatively large memory and the ability of parallel multi-core processing. In this work, we re-consider the design of spatial indexing under this new reality. Specifically, we propose a main-memory indexing approach for objects with spatial extent, which is based on a classic regular space partitioning into disjoint tiles. The novelty of our index is that the contents of each tile are further partitioned into four classes. This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem of spatial indexes based on disjoint space partitioning. The spatial partitions defined by our indexing scheme are totally independent, facilitating effortless parallel evaluation, as no synchronization or communication between the partitions is necessary. We show how our index can be used to efficiently process spatial range queries and drastically reduce the cost of the refinement step of the queries. In addition, we study the efficient processing of numerous range queries in batch and in parallel. Extensive experiments on real datasets confirm the efficiency of our approaches.

Databases

Cracking In-Memory Database Index A Case Study for Adaptive Radix Tree Index

57 - Gang Wu , Yidong Song , Guodong Zhao 2019

Indexes provide a method to access data in databases quickly. It can improve the response speed of subsequent queries by building a complete index in advance. However, it also leads to a huge overhead of the continuous updating during creating the index. An in-memory database usually has a higher query processing performance than disk databases and is more suitable for real-time query processing. Therefore, there is an urgent need to reduce the index creation and update cost for in-memory databases. Database cracking technology is currently recognized as an effective method to reduce the index initialization time. However, conventional cracking algorithms are focused on simple column data structure rather than those complex index structure for in-memory databases. In order to show the feasibility of in-memory database index cracking and promote to future more extensive research, this paper conducted a case study on the Adaptive Radix Tree (ART), a popular tree index structure of in-memory databases. On the basis of carefully examining the ART index construction overhead, an algorithm using auxiliary data structures to crack the ART index is proposed.

Databases

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management

91 - Jiarui Fang , Yang Yu , Shenggui Li 2021

The pre-trained model (PTM) is revolutionizing Artificial intelligence (AI) technology. It learns a model with general language features on the vast text and then fine-tunes the model using a task-specific dataset. Unfortunately, PTM training requires prohibitively expensive computing devices, especially fine-tuning, which is still a game for a small proportion of people in the AI community. Enabling PTMs training on low-quality devices, PatrickStar now makes PTM accessible to everyone. PatrickStar reduces memory requirements of computing platforms by using the CPU-GPU heterogeneous memory space to store model data, consisting of parameters, gradients, and optimizer states. We observe that the GPU memory available for model data changes regularly, in a tide-like pattern, decreasing and increasing iteratively. However, the existing heterogeneous training works do not take advantage of this pattern. Instead, they statically partition the model data among CPU and GPU, leading to both memory waste and memory abuse. In contrast, PatrickStar manages model data in chunks, which are dynamically distributed in heterogeneous memory spaces. Chunks consist of stateful tensors which run as finite state machines during training. Guided by the runtime memory statistics collected in a warm-up iteration, chunks are orchestrated efficiently in heterogeneous memory and generate lower CPU-GPU data transmission volume. Symbiosis with the Zero Redundancy Optimizer, PatrickStar scales to multiple GPUs using data parallelism, with the lowest communication bandwidth requirements and more efficient bandwidth utilization. Experimental results show PatrickStar trains a 12 billion parameters GPT model, 2x larger than the STOA work, on an 8-V100 and 240GB CPU memory node, and is also more efficient on the same model size.

Machine Learning Distributed Parallel and Cluster Computing

Memory-Efficient Pipeline-Parallel DNN Training

98 - Deepak Narayanan , Amar Phanishayee , Kaiyu Shi 2020

Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models. However, parameters and activations for such large models often do not fit in the memory of a single accelerator device; this means that it is necessary to distribute training of large models over multiple accelerators. In this work, we propose PipeDream-2BW, a system that supports memory-efficient pipeline parallelism. PipeDream-2BW uses a novel pipelining and weight gradient coalescing strategy, combined with the double buffering of weights, to ensure high throughput, low memory footprint, and weight update semantics similar to data parallelism. In addition, PipeDream-2BW automatically partitions the model over the available hardware resources, while respecting hardware constraints such as memory capacities of accelerators and interconnect topologies. PipeDream-2BW can accelerate the training of large GPT and BERT language models by up to 20$times$ with similar final model accuracy.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

comments

Fetching comments

Alshahba Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

PI : a Parallel in-memory skip list based Index

Ask ChatGPT about the research

No Arabic abstract

Read More