The SAGE Project: a Storage Centric Approach for Exascale Computing

334 0 0.0 ( 0 )

Download Cite

Added by Stefano Markidis Prof.

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Sai Narasimhamurthy - Nikita Danilov - Sining Wu

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a storage centric approach as it is capable of storing and processing large data volumes at the Exascale regime. SAGE addresses the convergence of Big Data Analysis and HPC in an era of next-generation data centric computing. This convergence is driven by the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors where data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. A first prototype of the SAGE system has been been implemented and installed at the Julich Supercomputing Center. The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces. The SAGE project also includes scientific applications for the validation of the SAGE concepts. The objective of this paper is to present the SAGE project concepts, the prototype of the SAGE platform and discuss the software architecture of the SAGE system.

rate research

SAGE: Percipient Storage for Exascale Data Centric Computing

275 - Sai Narasimhamurthy , Nikita Danilov , Sining Wu 2018

We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.

Distributed Parallel and Cluster Computing

Enhancing Cloud Storage with Shareable Instances for Social Computing

90 - Ying Mao , Peizhao Hu 2020

Cloud storage plays an important role in social computing. This paper aims to develop a cloud storage management system for mobile devices to support an extended set of file operations. Because of the limit of storage, bandwidth, power consumption, and other resource restrictions, most existing cloud storage apps for smartphones do not keep local copies of files. This efficient design, however, limits the application capacities. In this paper, we attempt to extend the available file operations for cloud storage service to better serve smartphone users. We develop an efficient and secure file management system, Skyfiles, to support more advanced file operations. The basic idea of our design is to utilize cloud instances to assist file operations. Particularly, Skyfiles supports downloading, compressing, encrypting, and converting operations, as well as file transfer between two smartphone users cloud storage spaces. In addition, we propose a protocol for users to share their idle instances. All file operations supported by Skyfiles can be efficiently and securely accomplished with either a self-created instance or shared instance.

Distributed Parallel and Cluster Computing Performance

A Micro-Service based Approach for Constructing Distributed Storage System

130 - Yuhao Lu , Zhenqing Liu , Dejun Jiang 2021

This paper presents an approach for constructing distributed storage system based on micro-service architecture. By building storage functionalities using micro services, we can provide new capabilities to a distributed storage system in a flexible way. We take erasure coding and compression as two case studies to show how to build a micro-service based distributed storage system. We also show that by building erasure coding and compression as micro-services, the distributed storage system still achieves reasonable performance compared to the monolithic one.

Distributed Parallel and Cluster Computing

ExaWorks: Workflows for Exascale

203 - Aymen Al-Saadi , Dong H. Ahn , Yadu Babuji 2021

Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which can address many of these challenges: ExaWorks is leading a co-design process to create a workflow software development Toolkit (SDK) consisting of a wide range of workflow management tools that can be composed and interoperate through common interfaces. We describe the initial set of tools and interfaces supported by the SDK, efforts to make them easier to apply to complex science challenges, and examples of their application to exemplar cases. Furthermore, we discuss how our project is working with the workflows community, large computing facilities as well as HPC platform vendors to sustainably address the requirements of workflows at the exascale.

Distributed Parallel and Cluster Computing

Machine Learning (ML)-Centric Resource Management in Cloud Computing: A Review and Future Directions

228 - Tahseen Khan , Wenhong Tian , Rajkumar Buyya 2021

Cloud computing has rapidly emerged as model for delivering Internet-based utility computing services. In cloud computing, Infrastructure as a Service (IaaS) is one of the most important and rapidly growing fields. Cloud providers provide users/machines resources such as virtual machines, raw (block) storage, firewalls, load balancers, and network devices in this service model. One of the most important aspects of cloud computing for IaaS is resource management. Scalability, quality of service, optimum utility, reduced overheads, increased throughput, reduced latency, specialised environment, cost effectiveness, and a streamlined interface are some of the advantages of resource management for IaaS in cloud computing. Traditionally, resource management has been done through static policies, which impose certain limitations in various dynamic scenarios, prompting cloud service providers to adopt data-driven, machine-learning-based approaches. Machine learning is being used to handle a variety of resource management tasks, including workload estimation, task scheduling, VM consolidation, resource optimization, and energy optimization, among others. This paper provides a detailed review of challenges in ML-based resource management in current research, as well as current approaches to resolve these challenges, as well as their advantages and limitations. Finally, we propose potential future research directions based on identified challenges and limitations in current research.

Distributed Parallel and Cluster Computing Machine Learning