ﻻ يوجد ملخص باللغة العربية
This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.
Over the last two decades, the field of computational science has seen a dramatic shift towards incorporating high-throughput computation and big-data analysis as fundamental pillars of the scientific discovery process. This has necessitated the deve
Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, howe
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communicatio
In order to achieve near-time insights, scientific workflows tend to be organized in a flexible and dynamic way. Data-driven triggering of tasks has been explored as a way to support workflows that evolve based on the data. However, the overhead intr
It is common practice to partition complex workflows into separate channels in order to speed up their completion times. When this is done within a distributed environment, unavoidable fluctuations make individual realizations depart from the expecte