New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Fast Query Processing by Distributing an Index over CPU Caches

67 0 0.0 ( 0 )

Download Cite

Added by Xiaoqin Ma

Publication date 2004

fields Informatics Engineering

and research's language is English

Authors Xiaoqin Ma - Gene Cooperman

Distributed Parallel and Cluster Computing Performance

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Data intensive applications on clusters often require requests quickly be sent to the node managing the desired data. In many applications, one must look through a sorted tree structure to determine the responsible node for accessing or storing the data. Examples include object tracking in sensor networks, packet routing over the internet, request processing in publish-subscribe middleware, and query processing in database systems. When the tree structure is larger than the CPU cache, the standard implementation potentially incurs many cache misses for each lookup; one cache miss at each successive level of the tree. As the CPU-RAM gap grows, this performance degradation will only become worse in the future. We propose a solution that takes advantage of the growing speed of local area networks for clusters. We split the sorted tree structure among the nodes of the cluster. We assume that the structure will fit inside the aggregation of the CPU caches of the entire cluster. We then send a word over the network (as part of a larger packet containing other words) in order to examine the tree structure in another nodes CPU cache. We show that this is often faster than the standard solution, which locally incurs multiple cache misses while accessing each successive level of the tree.

rate research

Importance of Explicit Vectorization for CPU and GPU Software Performance

184 - Neil G. Dickson , Kamran Karimi , Firas Hamze 2010

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

Distributed Parallel and Cluster Computing Performance Computational Physics

Probabilistic Skyline Query Processing over Uncertain Data Streams in Edge Computing Environments

88 - Chuan-Chi Lai , Chuan-Ming Liu , Yan-Lin Chen 2020

With the advancement of technology, the data generated in our lives is getting faster and faster, and the amount of data that various applications need to process becomes extremely huge. Therefore, we need to put more effort into analyzing data and extracting valuable information. Cloud computing used to be a good technology to solve a large number of data analysis problems. However, in the era of the popularity of the Internet of Things (IoT), transmitting sensing data back to the cloud for centralized data analysis will consume a lot of wireless communication and network transmission costs. To solve the above problems, edge computing has become a promising solution. In this paper, we propose a new algorithm for processing probabilistic skyline queries over uncertain data streams in an edge computing environment. We use the concept of a second skyline set to filter data that is unlikely to be the result of the skyline. Besides, the edge server only sends the information needed to update the global analysis results on the cloud server, which will greatly reduce the amount of data transmitted over the network. The results show that our proposed method not only reduces the response time by more than 50% compared with the brute force method on two-dimensional data but also maintains the leading processing speed on high-dimensional data.

Distributed Parallel and Cluster Computing Databases Data Structures and Algorithms

Graph Computing based Distributed Fast Decoupled Power Flow Analysis

283 - Chen Yuan , Yi Lu , Wei Feng 2019

Power flow analysis plays a fundamental and critical role in the energy management system (EMS). It is required to well accommodate large and complex power system. To achieve a high performance and accurate power flow analysis, a graph computing based distributed power flow analysis approach is proposed in this paper. Firstly, a power system network is divided into multiple areas. Slack buses are selected for each area and, at each SCADA sampling period, the inter-area transmission line power flows are equivalently allocated as extra load injections to corresponding buses. Then, the system network is converted into multiple independent areas. In this way, the power flow analysis could be conducted in parallel for each area and the solved system states could be guaranteed without compromise of accuracy. Besides, for each area, graph computing based fast decoupled power flow (FDPF) is employed to quickly analyze system states. IEEE 118-bus system and MP 10790-bus system are employed to verify the results accuracy and present the promising computation performance of the proposed approach.

Distributed Parallel and Cluster Computing Performance

CROFT: A scalable three-dimensional parallel Fast Fourier Transform (FFT) implementation for High Performance Clusters

97 - Vivek Gavane , Supriya Prabhugawankar , Shivam Garg 2020

The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors. Performance of many scientific applications such as Molecular Dynamic simulations depends on the underlying 3D parallel FFT library being used. In this paper, we present C-DACs three-dimensional Fast Fourier Transform (CROFT) library which implements three-dimensional parallel FFT using pencil decomposition. To exploit the hyperthreading capabilities of processor cores without affecting performance, CROFT is designed to use multithreading along with MPI. CROFT implementation has an innovative feature of overlapping compute and memory-I/O with MPI communication using multithreading. As opposed to other 3D FFT implementations, CROFT uses only two threads where one thread is dedicated for communication so that it can be effectively overlapped with computations. Thus, depending on the number of processes used, CROFT achieves performance improvement of about 51% to 42% as compared to FFTW3 library.

Distributed Parallel and Cluster Computing Performance

Metadata Challenge for Query Processing Over Heterogeneous Wireless Sensor Network

124 - C.Komalavalli Chetna Laroiya (Jagan Insitute of Management Studies 2011

Wireless sensor networks become integral part of our life. These networks can be used for monitoring the data in various domain due to their flexibility and functionality. Query processing and optimization in the WSN is a very challenging task because of their energy and memory constraint. In this paper, first our focus is to review the different approaches that have significant impacts on the development of query processing techniques for WSN. Finally, we aim to illustrate the existing approach in popular query processing engines with future research challenges in query optimization.

Databases

comments

Fetching comments

Alshahba Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fast Query Processing by Distributing an Index over CPU Caches

Ask ChatGPT about the research

No Arabic abstract

Read More