Toward Interoperable Cyberinfrastructure: Common Descriptions for Computational Resources and Applications

268 0 0.0 ( 0 )

Download Cite

Added by Daniel S. Katz

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Joe Stubbs - Suresh Marru - Daniel Mejia

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The user-facing components of the Cyberinfrastructure (CI) ecosystem, science gateways and scientific workflow systems, share a common need of interfacing with physical resources (storage systems and execution environments) to manage data and execute codes (applications). However, there is no uniform, platform-independent way to describe either the resources or the applications. To address this, we propose uniform semantics for describing resources and applications that will be relevant to a diverse set of stakeholders. We sketch a solution to the problem of a common description and catalog of resources: we describe an approach to implementing a resource registry for use by the community and discuss potential approaches to some long-term challenges. We conclude by looking ahead to the application description language.

rate research

Blueprint: Cyberinfrastructure Center of Excellence

57 - Ewa Deelman 2021

In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a community of CI practitioners and researchers.

Distributed Parallel and Cluster Computing

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

52 - Michael R. Crusoe , Sanne Abeln , Alexandru Iosup 2021

Computational Workflows are widely used in data analysis, enabling innovation and decision-making. In many domains (bioinformatics, image analysis, & radio astronomy) the analysis components are numerous and written in multiple different computer languages by third parties. However, many competing workflow systems exist, severely limiting portability of such workflows, thereby hindering the transfer of workflows between different systems, between different projects and different settings, leading to vendor lock-ins and limiting their generic re-usability. Here we present the Common Workflow Language (CWL) project which produces free and open standards for describing command-line tool based workflows. The CWL standards provide a common but reduced set of abstractions that are both used in practice and implemented in many popular workflow systems. The CWL language is declarative, which allows expressing computational workflows constructed from diverse software tools, executed each through their command-line interface. Being explicit about the runtime environment and any use of software containers enables portability and reuse. Workflows written according to the CWL standards are a reusable description of that analysis that are runnable on a diverse set of computing environments. These descriptions contain enough information for advanced optimization without additional input from workflow authors. The CWL standards support polylingual workflows, enabling portability and reuse of such workflows, easing for example scholarly publication, fulfilling regulatory requirements, collaboration in/between academic research and industry, while reducing implementation costs. CWL has been taken up by a wide variety of domains, and industries and support has been implemented in many major workflow systems.

Distributed Parallel and Cluster Computing

Trading classical and quantum computational resources

145 - Sergey Bravyi , Graeme Smith , 2015

We propose examples of a hybrid quantum-classical simulation where a classical computer assisted by a small quantum processor can efficiently simulate a larger quantum system. First we consider sparse quantum circuits such that each qubit participates in O(1) two-qubit gates. It is shown that any sparse circuit on n+k qubits can be simulated by sparse circuits on n qubits and a classical processing that takes time $2^{O(k)} poly(n)$. Secondly, we study Pauli-based computation (PBC) where allowed operations are non-destructive eigenvalue measurements of n-qubit Pauli operators. The computation begins by initializing each qubit in the so-called magic state. This model is known to be equivalent to the universal quantum computer. We show that any PBC on n+k qubits can be simulated by PBCs on n qubits and a classical processing that takes time $2^{O(k)} poly(n)$. Finally, we propose a purely classical algorithm that can simulate a PBC on n qubits in a time $2^{c n} poly(n)$ where $capprox 0.94$. This improves upon the brute-force simulation method which takes time $2^n poly(n)$. Our algorithm exploits the fact that n-fold tensor products of magic states admit a low-rank decomposition into n-qubit stabilizer states.

Quantum Physics

Leveraging User Access Patterns and Advanced Cyberinfrastructure to Accelerate Data Delivery from Shared-use Scientific Observatories

147 - Yubo Qin , Ivan Rodero , Anthony Simonet 2020

With the growing number and increasing availability of shared-use instruments and observatories, observational data is becoming an essential part of application workflows and contributor to scientific discoveries in a range of disciplines. However, the corresponding growth in the number of users accessing these facilities coupled with the expansion in the scale and variety of the data, is making it challenging for these facilities to ensure their data can be accessed, integrated, and analyzed in a timely manner, and is resulting significant demands on their cyberinfrastructure (CI). In this paper, we present the design of a push-based data delivery framework that leverages emerging in-network capabilities, along with data pre-fetching techniques based on a hybrid data management model. Specifically, we analyze data access traces for two large-scale observatories, Ocean Observatories Initiative (OOI) and Geodetic Facility for the Advancement of Geoscience (GAGE), to identify typical user access patterns and to develop a model that can be used for data pre-fetching. Furthermore, we evaluate our data pre-fetching model and the proposed framework using a simulation of the Virtual Data Collaboratory (VDC) platform that provides in-network data staging and processing capabilities. The results demonstrate that the ability of the framework to significantly improve data delivery performance and reduce network traffic at the observatories facilities.

Distributed Parallel and Cluster Computing Multiagent Systems

Toward Common Components for Open Workflow Systems

71 - Jay Jay Billings , Shantenu Jha 2017

The role of scalable high-performance workflows and flexible workflow management systems that can support multiple simulations will continue to increase in importance. For example, with the end of Dennard scaling, there is a need to substitute a single long running simulation with multiple repeats of shorter simulations, or concurrent replicas. Further, many scientific problems involve ensembles of simulations in order to solve a higher-level problem or produce statistically meaningful results. However most supercomputing software development and performance enhancements have focused on optimizing single- simulation performance. On the other hand, there is a strong inconsistency in the definition and practice of workflows and workflow management systems. This inconsistency often centers around the difference between several different types of workflows, including modeling and simulation, grid, uncertainty quantification, and purely conceptual workflows. This work explores this phenomenon by examining the different types of workflows and workflow management systems, reviewing the perspective of a large supercomputing facility, examining the common features and problems of workflow management systems, and finally presenting a proposed solution based on the concept of common building blocks. The implications of the continuing proliferation of workflow management systems and the lack of interoperability between these systems are discussed from a practical perspective. In doing so, we have begun an investigation of the design and implementation of open workflow systems for supercomputers based upon common components.

Software Engineering