WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development -- Technical Report

89 0 0.0 ( 0 )

Download Cite

Added by Rafael Ferreira da Silva

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Rafael Ferreira da Silva

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed at heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous work, we have presented a collection of tools that have been used for aiding research and development activities in the Pegasus project, and that have been adopted by others for conducting workflow research. Despite their popularity, there are several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WorkflowHub, a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions. We demonstrate the realism of the generated synthetic traces by comparing simulated executions of these traces with actual workflow executions. We also contrast these results with those obtained when using the previously available collection of tools. We find that our framework not only can be used to generate representative synthetic workflow traces (i.e., with workflow structures and task characteristics distributions that resembles those in traces obtained from real-world workflow executions), but can also generate representative workflow traces at larger scales than that of available workflow traces.

rate research

A Survey and Annotated Bibliography of Workflow Scheduling in Computing Infrastructures: Community, Keyword, and Article Reviews -- Extended Technical Report

78 - Laurens Versluis , Alexandru Iosup 2020

Workflows are prevalent in todays computing infrastructures. The workflow model support various different domains, from machine learning to finance and from astronomy to chemistry. Different Quality-of-Service (QoS) requirements and other desires of both users and providers makes workflow scheduling a tough problem, especially since resource providers need to be as efficient as possible with their resources to be competitive. To a newcomer or even an experienced researcher, sifting through the vast amount of articles can be a daunting task. Questions regarding the difference techniques, policies, emerging areas, and opportunities arise. Surveys are an excellent way to cover these questions, yet surveys rarely publish their tools and data on which it is based. Moreover, the communities that are behind these articles are rarely studied. We attempt to address these shortcomings in this work. We focus on four areas within workflow scheduling: 1) the workflow formalism, 2) workflow allocation, 3) resource provisioning, and 4) applications and services. Each part features one or more taxonomies, a view of the community, important and emerging keywords, and directions for future work. We introduce and make open-source an instrument we used to combine and store article meta-data. Using this meta-data, we 1) obtain important keywords overall and per year, per community, 2) identify keywords growing in importance, 3) get insight into the structure and relations within each community, and 4) perform a systematic literature survey per part to validate and complement our taxonomies.

Distributed Parallel and Cluster Computing

Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

73 - Rafael Ferreira da Silva , Henri Casanova , Kyle Chard 2021

Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moores computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: https://workflowsri.org/summits/technical

Distributed Parallel and Cluster Computing

The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures -- Technical Report

109 - Laurens Versluis , Roland Matha , Sacheendra Talluri 2019

Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows---common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, and (2) the use of realistic, {it open-access} traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes ${>}48$ million workflows captured from ${>}10$ computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.

Distributed Parallel and Cluster Computing

Workflows Community Summit: Bringing the Scientific Workflows Community Together

107 - Rafael Ferreira da Silva , Henri Casanova , Kyle Chard 2021

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The Workflows Community Summit was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

Distributed Parallel and Cluster Computing

Technical Report: Estimating Reliability of Workers for Cooperative Distributed Computing

138 - Seda Davtyan , Kishori M. Konwar , Alexander A. Shvartsman 2014

Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. For the problem of using network supercomputing to perform a large collection of independent tasks, prior work introduced a decentralized approach and provided randomized synchronous algorithms that perform all tasks correctly with high probability, while dealing with misbehaving or crash-prone processors. The main weaknesses of existing algorithms is that they assume either that the emph{average} probability of a non-crashed processor returning incorrect results is inferior to $frac{1}{2}$, or that the probability of returning incorrect results is known to emph{each} processor. Here we present a randomized synchronous distributed algorithm that tightly estimates the probability of each processor returning correct results. Starting with the set $P$ of $n$ processors, let $F$ be the set of processors that crash. Our algorithm estimates the probability $p_i$ of returning a correct result for each processor $i in P-F$, making the estimates available to all these processors. The estimation is based on the $(epsilon, delta)$-approximation, where each estimated probability $tilde{p_i}$ of $p_i$ obeys the bound ${sf Pr}[p_i(1-epsilon) leq tilde{p_i} leq p_i(1+epsilon)] > 1 - delta$, for any constants $delta >0$ and $epsilon >0$ chosen by the user. An important aspect of this algorithm is that each processor terminates without global coordination. We assess the efficiency of the algorithm in three adversarial models as follows. For the model where the number of non-crashed processors $|P-F|$ is linearly bounded the time complexity $T(n)$ of the algorithm is $Theta(log{n})$, work complexity $W(n)$ is $Theta(nlog{n})$, and message complexity $M(n)$ is $Theta(nlog^2n)$.

Distributed Parallel and Cluster Computing