Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection

477 0 0.0 ( 0 )

Download Cite

Added by Bogdan Nicolae

Publication date 2008

fields Informatics Engineering

and research's language is English

Authors Bogdan Nicolae

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the problem of efficiently managing massive data in a large-scale distributed environment. We consider data strings of size in the order of Terabytes, shared and accessed by concurrent clients. On each individual access, a segment of a string, of the order of Megabytes, is read or modified. Our goal is to provide the clients with efficient fine-grain access the data string as concurrently as possible, without locking the string itself. This issue is crucial in the context of applications in the field of astronomy, databases, data mining and multimedia. We illustrate these requiremens with the case of an application for searching supernovae. Our solution relies on distributed, RAM-based data storage, while leveraging a DHT-based, parallel metadata management scheme. The proposed architecture and algorithms have been validated through a software prototype and evaluated in a cluster environment.

rate research

Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme

473 - Bogdan Nicolae 2008

This paper addresses the problem of efficiently storing and accessing massive data blocks in a large-scale distributed environment, while providing efficient fine-grain access to data subsets. This issue is crucial in the context of applications in the field of databases, data mining and multimedia. We propose a data sharing service based on distributed, RAM-based storage of data, while leveraging a DHT-based, natively parallel metadata management scheme. As opposed to the most commonly used grid storage infrastructures that provide mechanisms for explicit data localization and transfer, we provide a transparent access model, where data are accessed through global identifiers. Our proposal has been validated through a prototype implementation whose preliminary evaluation provides promising results.

Distributed Parallel and Cluster Computing

Lock-free Concurrent Data Structures

488 - Daniel Cederman , Anders Gidenstam , Phuong Ha 2013

Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.

Distributed Parallel and Cluster Computing Data Structures and Algorithms Programming Languages

Tracking in Order to Recover: Detectable Recovery of Lock-Free Data Structures

83 - Hagit Attiya , Ohad Ben-Baruch , Panagiota Fatourou 2019

This paper presents the tracking approach for deriving detectably recoverable (and thus also durable) implementations of many widely-used concurrent data structures. Such data structures, satisfying detectable recovery, are appealing for emerging systems featuring byte-addressable non-volatile main memory (NVRAM), whose persistence allows to efficiently resurrect failed processes after crashes. Detectable recovery ensures that after a crash, every executed operation is able to recover and return a correct response, and that the state of the data structure is not corrupted. Info-Structure Based (ISB)-tracking amends descriptor objects used in existing lock-free helping schemes with additional fields that track an operations progress towards completion and persists these fields to memory in order to ensure detectable recovery. ISB-tracking avoids full-fledged logging and tracks the progress of concurrent operations in a per-process manner, thus reducing the cost of ensuring detectable recovery. We have applied ISB-tracking to derive detectably recoverable implementations of a queue, a linked list, a binary search tree, and an exchanger. Experimental results show the feasibility of the technique.

Distributed Parallel and Cluster Computing

SDN helps Big Data to optimize access to data

127 - Yuankun Fu , Fengguang Song 2020

This chapter introduces the state-of-the-art in the emerging area of combining High Performance Computing (HPC) with Big Data Analysis. To understand the new area, the chapter first surveys the existing approaches to integrating HPC with Big Data. Next, the chapter introduces several optimization solutions that focus on how to minimize the data transfer time from computation-intensive applications to analysis-intensive applications as well as minimizing the end-to-end time-to-solution. The solutions utilize SDN to adaptively use both high speed interconnect network and high performance parallel file systems to optimize the application performance. A computational framework called DataBroker is designed and developed to enable a tight integration of HPC with data analysis. Multiple types of experiments have been conducted to show different performance issues in both message passing and parallel file systems and to verify the effectiveness of the proposed research approaches.

Distributed Parallel and Cluster Computing Networking and Internet Architecture Performance

Cache-Aware Lock-Free Concurrent Hash Tries

67 - Aleksandar Prokopec , Phil Bagwell , Martin Odersky 2017

This report describes an implementation of a non-blocking concurrent shared-memory hash trie based on single-word compare-and-swap instructions. Insert, lookup and remove operations modifying different parts of the hash trie can be run independent of each other and do not contend. Remove operations ensure that the unneeded memory is freed and that the trie is kept compact. A pseudocode for these operations is presented and a proof of correctness is given -- we show that the implementation is linearizable and lock-free. Finally, benchmarks are presented which compare concurrent hash trie operations against the corresponding operations on other concurrent data structures, showing their performance and scalability.

Data Structures and Algorithms