Blueprint: Cyberinfrastructure Center of Excellence

58 0 0.0 ( 0 )

Download Cite

Added by Rafael Ferreira da Silva

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Ewa Deelman

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a community of CI practitioners and researchers.

rate research

Toward Interoperable Cyberinfrastructure: Common Descriptions for Computational Resources and Applications

267 - Joe Stubbs , Suresh Marru , Daniel Mejia 2021

The user-facing components of the Cyberinfrastructure (CI) ecosystem, science gateways and scientific workflow systems, share a common need of interfacing with physical resources (storage systems and execution environments) to manage data and execute codes (applications). However, there is no uniform, platform-independent way to describe either the resources or the applications. To address this, we propose uniform semantics for describing resources and applications that will be relevant to a diverse set of stakeholders. We sketch a solution to the problem of a common description and catalog of resources: we describe an approach to implementing a resource registry for use by the community and discuss potential approaches to some long-term challenges. We conclude by looking ahead to the application description language.

Distributed Parallel and Cluster Computing

Leveraging User Access Patterns and Advanced Cyberinfrastructure to Accelerate Data Delivery from Shared-use Scientific Observatories

147 - Yubo Qin , Ivan Rodero , Anthony Simonet 2020

With the growing number and increasing availability of shared-use instruments and observatories, observational data is becoming an essential part of application workflows and contributor to scientific discoveries in a range of disciplines. However, the corresponding growth in the number of users accessing these facilities coupled with the expansion in the scale and variety of the data, is making it challenging for these facilities to ensure their data can be accessed, integrated, and analyzed in a timely manner, and is resulting significant demands on their cyberinfrastructure (CI). In this paper, we present the design of a push-based data delivery framework that leverages emerging in-network capabilities, along with data pre-fetching techniques based on a hybrid data management model. Specifically, we analyze data access traces for two large-scale observatories, Ocean Observatories Initiative (OOI) and Geodetic Facility for the Advancement of Geoscience (GAGE), to identify typical user access patterns and to develop a model that can be used for data pre-fetching. Furthermore, we evaluate our data pre-fetching model and the proposed framework using a simulation of the Virtual Data Collaboratory (VDC) platform that provides in-network data staging and processing capabilities. The results demonstrate that the ability of the framework to significantly improve data delivery performance and reduce network traffic at the observatories facilities.

Distributed Parallel and Cluster Computing Multiagent Systems

A monitoring tool for a GRID operation center

84 - S. Andreozzi , S. Fantinel , D. Rebatto 2003

WorldGRID is an intercontinental testbed spanning Europe and the US integrating architecturally different Grid implementations based on the Globus toolkit. The WorldGRID testbed has been successfully demonstrated during the WorldGRID demos at SuperComputing 2002 (Baltimore) and IST2002 (Copenhagen) where real HEP application jobs were transparently submitted from US and Europe using native mechanisms and run where resources were available, independently of their location. To monitor the behavior and performance of such testbed and spot problems as soon as they arise, DataTAG has developed the EDT-Monitor tool based on the Nagios package that allows for Virtual Organization centric views of the Grid through dynamic geographical maps. The tool has been used to spot several problems during the WorldGRID operations, such as malfunctioning Resource Brokers or Information Servers, sites not correctly configured, job dispatching problems, etc. In this paper we give an overview of the package, its features and scalability solutions and we report on the experience acquired and the benefit that a GRID operation center would gain from such a tool.

Distributed Parallel and Cluster Computing

A Framework for Auditing Data Center Energy Usage and Mitigating Environmental Footprint

46 - Justin Gould 2021

As the Data Science field continues to mature, and we collect more data, the demand to store and analyze them will continue to increase. This increase in data availability and demand for analytics will put a strain on data centers and compute clusters-with implications for both energy costs and emissions. As the world battles a climate crisis, it is prudent for organizations with data centers to have a framework for combating increasing energy costs and emissions to meet demand for analytics work. In this paper, I present a generalized framework for organizations to audit data centers energy efficiency to understand the resources required to operate a given data center and effective steps organizations can take to improve data center efficiency and lower the environmental impact.

Distributed Parallel and Cluster Computing

Transmission Failure Analysis of Multi-Protection Routing in Data Center Networks with Heterogeneous Edge-Core Servers

528 - Xiao-Yan Li , Wanling Lin , Jou-Ming Chang 2021

The recently proposed RCube network is a cube-based server-centric data center network (DCN), including two types of heterogeneous servers, called core and edge servers. Remarkably, it takes the latter as backup servers to deal with server failures and thus achieve high availability. This paper first points out that RCube is suitable as a candidate topology of DCNs for edge computing. Three transmission types are among core and edge servers based on the demand for applications computation and instant response. We then employ protection routing to analyze the transmission failure of RCube DCNs. Unlike traditional protection routing, which only tolerates a single link or node failure, we use the multi-protection routing scheme to improve fault-tolerance capability. To configure a protection routing in a network, according to Tapolcais suggestion, we need to construct two completely independent spanning trees (CISTs). A logic graph of RCube, denoted by $L$-$RCube(n,m,k)$, is a network with a recursive structure. Each basic building element consists of $n$ core servers and $m$ edge servers, where the order $k$ is the number of recursions applied in the structure. In this paper, we provide algorithms to construct $min{n,lfloor(n+m)/2rfloor}$ CISTs in $L$-$RCube(n,m,k)$ for $n+mgeqslant 4$ and $n>1$. From a combination of the multiple CISTs, we can configure the desired multi-protection routing. In our simulation, we configure up to 10 protection routings for RCube DCNs. As far as we know, in past research, there were at most three protection routings developed in other network structures. Finally, we summarize some crucial analysis viewpoints about the transmission efficiency of DCNs with heterogeneous edge-core servers from the simulation results.

Distributed Parallel and Cluster Computing Networking and Internet Architecture