Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Enabling Reproducible Analysis of Complex Workflows on the Edge-to-Cloud Continuum

223 0 0.0 ( 0 )

Download Cite

Added by Daniel Rosendo

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Daniel Rosendo - Alexandru Costan (INSA Rennes

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex applications to be executed from IoT Edge devices to the HPC Cloud (aka the Computing Continuum, the Digital Continuum, or the Transcontinuum). Understanding end-to-end performance in such a complex continuum is challenging. This breaks down to reconciling many, typically contradicting application requirements and constraints with low-level infrastructure design choices. One important challenge is to accurately reproduce relevant behaviors of a given application workflow and representative settings of the physical infrastructure underlying this complex continuum. We introduce a rigorous methodology for such a process and validate it through E2Clab. It is the first platform to support the complete experimental cycle across the Computing Continuum: deployment, analysis, optimization. Preliminary results with real-life use cases show that E2Clab allows one to understand and improve performance, by correlating it to the parameter settings, the resource usage and the specifics of the underlying infrastructure.

rate research

Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

157 - Daniel Rosendo , Alexandru Costan , Gabriel Antoniu 2021

In more and more application areas, we are witnessing the emergence of complex workflows that combine computing, analytics and learning. They often require a hybrid execution infrastructure with IoT devices interconnected to cloud/HPC systems (aka Computing Continuum). Such workflows are subject to complex constraints and requirements in terms of performance, resource usage, energy consumption and financial costs. This makes it challenging to optimize their configuration and deployment. We propose a methodology to support the optimization of real-life applications on the Edge-to-Cloud Continuum. We implement it as an extension of E2Clab, a previously proposed framework supporting the complete experimental cycle across the Edge-to-Cloud Continuum. Our approach relies on a rigorous analysis of possible configurations in a controlled testbed environment to understand their behaviour and related performance trade-offs. We illustrate our methodology by optimizing Pl@ntNet, a world-wide plant identification application. Our methodology can be generalized to other applications in the Edge-to-Cloud Continuum.

Distributed Parallel and Cluster Computing Artificial Intelligence Machine Learning

On the performance overhead tradeoff of distributed principal component analysis via data partitioning

158 - Ni An , Steven Weber 2015

Principal component analysis (PCA) is not only a fundamental dimension reduction method, but is also a widely used network anomaly detection technique. Traditionally, PCA is performed in a centralized manner, which has poor scalability for large distributed systems, on account of the large network bandwidth cost required to gather the distributed state at a fusion center. Consequently, several recent works have proposed various distributed PCA algorithms aiming to reduce the communication overhead incurred by PCA without losing its inferential power. This paper evaluates the tradeoff between communication cost and solution quality of two distributed PCA algorithms on a real domain name system (DNS) query dataset from a large network. We also apply the distributed PCA algorithm in the area of network anomaly detection and demonstrate that the detection accuracy of both distributed PCA-based methods has little degradation in quality, yet achieves significant savings in communication bandwidth.

Distributed Parallel and Cluster Computing Networking and Internet Architecture Performance

A Survey on Time-Sensitive Resource Allocation in the Cloud Continuum

84 - Saravanan Ramanathan 2020

Artificial Intelligence (AI) and Internet of Things (IoT) applications are rapidly growing in todays world where they are continuously connected to the internet and process, store and exchange information among the devices and the environment. The cloud and edge platform is very crucial to these applications due to their inherent compute-intensive and resource-constrained nature. One of the foremost challenges in cloud and edge resource allocation is the efficient management of computation and communication resources to meet the performance and latency guarantees of the applications. The heterogeneity of cloud resources (processors, memory, storage, bandwidth), variable cost structure and unpredictable workload patterns make the design of resource allocation techniques complex. Numerous research studies have been carried out to address this intricate problem. In this paper, the current state-of-the-art resource allocation techniques for the cloud continuum, in particular those that consider time-sensitive applications, are reviewed. Furthermore, we present the key challenges in the resource allocation problem for the cloud continuum, a taxonomy to classify the existing literature and the potential research gaps.

Distributed Parallel and Cluster Computing Networking and Internet Architecture

An In-Depth Analysis of the Slingshot Interconnect

135 - Daniele De Sensi , Salvatore Di Girolamo , Kim H. McMahon 2020

The interconnect is one of the most critical components in large scale computing systems, and its impact on the performance of applications is going to increase with the system size. In this paper, we will describe Slingshot, an interconnection network for large scale computing systems. Slingshot is based on high-radix switches, which allow building exascale and hyperscale datacenters networks with at most three switch-to-switch hops. Moreover, Slingshot provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes. Slingshot uses an optimized Ethernet protocol, which allows it to be interoperable with standard Ethernet devices while providing high performance to HPC applications. We analyze the extent to which Slingshot provides these features, evaluating it on microbenchmarks and on several applications from the datacenter and AI worlds, as well as on HPC applications. We find that applications running on Slingshot are less affected by congestion compared to previous generation networks.

Distributed Parallel and Cluster Computing Networking and Internet Architecture Performance

Resource Management in Edge and Fog Computing using FogBus2 Framework

217 - Mohammad Goudarzi , Qifan Deng , 2021

Edge/Fog computing is a novel computing paradigm that provides resource-limited Internet of Things (IoT) devices with scalable computing and storage resources. Compared to cloud computing, edge/fog servers have fewer resources, but they can be accessed with higher bandwidth and less communication latency. Thus, integrating edge/fog and cloud infrastructures can support the execution of diverse latency-sensitive and computation-intensive IoT applications. Although some frameworks attempt to provide such integration, there are still several challenges to be addressed, such as dynamic scheduling of different IoT applications, scalability mechanisms, multi-platform support, and supporting different interaction models. FogBus2, as a new python-based framework, offers a lightweight and distributed container-based framework to overcome these challenges. In this chapter, we highlight key features of the FogBus2 framework alongside describing its main components. Besides, we provide a step-by-step guideline to set up an integrated computing environment, containing multiple cloud service providers (Hybrid-cloud) and edge devices, which is a prerequisite for any IoT application scenario. To obtain this, a low-overhead communication network among all computing resources is initiated by the provided scripts and configuration files. Next, we provide instructions and corresponding code snippets to install and run the main framework and its integrated applications. Finally, we demonstrate how to implement and integrate several new IoT applications and custom scheduling and scalability policies with the FogBus2 framework.

Distributed Parallel and Cluster Computing Networking and Internet Architecture Performance

comments

Fetching comments

National Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Enabling Reproducible Analysis of Complex Workflows on the Edge-to-Cloud Continuum

Ask ChatGPT about the research

No Arabic abstract

Read More