New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Harvesting Idle Resources in Serverless Computing via Reinforcement Learning

431 0 0.0 ( 0 )

Download Cite

Added by Hao Wang

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Hanfei Yu - Hao Wang - Jian Li

Distributed Parallel and Cluster Computing Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Serverless computing has become a new cloud computing paradigm that promises to deliver high cost-efficiency and simplified cloud deployment with automated resource scaling at a fine granularity. Users decouple a cloud application into chained functions and preset each serverless functions memory and CPU demands at megabyte-level and core-level, respectively. Serverless platforms then automatically scale the number of functions to accommodate the workloads. However, the complexities of chained functions make it non-trivial to accurately determine the resource demands of each function for users, leading to either resource over-provision or under-provision for individual functions. This paper presents FaaSRM, a new resource manager (RM) for serverless platforms that maximizes resource efficiency by dynamically harvesting idle resources from functions over-supplied to functions under-supplied. FaaSRM monitors each functions resource utilization in real-time, detects over-provisioning and under-provisioning, and applies deep reinforcement learning to harvest idle resources safely using a safeguard mechanism and accelerate functions efficiently. We have implemented and deployed a FaaSRM prototype in a 13-node Apache OpenWhisk cluster. Experimental results on the OpenWhisk cluster show that FaaSRM reduces the execution time of 98% of function invocations by 35.81% compared to the baseline RMs by harvesting idle resources from 38.8% of the invocations and accelerating 39.2% of the invocations.

rate research

Adaptive parallelism with RMI: Idle high-performance computing resources can be completely avoided

506 - Florian Spenke , Karsten Balzer , Sascha Frick 2018

In practice, standard scheduling of parallel computing jobs almost always leaves significant portions of the available hardware unused, even with many jobs still waiting in the queue. The simple reason is that the resource requests of these waiting jobs are fixed and do not match the available, unused resources. However, with alternative but existing and well-established techniques it is possible to achieve a fully automated, adaptive parallelism that does not need pre-set, fixed resources. Here, we demonstrate that such an adaptively parallel program can indeed fill in all such scheduling gaps, even in real-life situations on large supercomputers.

Distributed Parallel and Cluster Computing Chemical Physics Computational Physics

Towards Demystifying Serverless Machine Learning Training

277 - Jiawei Jiang , Shaoduo Gan , Yue Liu 2021

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over serverful infrastructures (IaaS). In this paper we present a systematic, comparative study of distributed ML training over FaaS and IaaS. We present a design space covering design choices such as optimization algorithms and synchronization protocols, and implement a platform, LambdaML, that enables a fair comparison between FaaS and IaaS. We present experimental results using LambdaML, and further develop an analytic model to capture cost/performance tradeoffs that must be considered when opting for a serverless infrastructure. Our results indicate that ML training pays off in serverless only for models with efficient (i.e., reduced) communication and that quickly converge. In general, FaaS can be much faster but it is never significantly cheaper than IaaS.

Distributed Parallel and Cluster Computing Machine Learning

Distributed Double Machine Learning with a Serverless Architecture

109 - Malte S. Kurz 2021

This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It allows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype Python implementation texttt{DoubleML-Serverless} for the estimation of double machine learning models with the serverless computing platform AWS Lambda and demonstrate its utility with a case study analyzing estimation times and costs.

Distributed Parallel and Cluster Computing Machine Learning Machine Learning

A Fault-Tolerance Shim for Serverless Computing

204 - Vikram Sreekanti , Chenggang Wu , Saurav Chhatrapati 2020

Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would like atomic visibility of the updates made by a FaaS application. In this paper, we present AFT, an atomic fault tolerance shim for serverless applications. AFT interposes between a commodity FaaS platform and storage engine and ensures atomic visibility of updates by enforcing the read atomic isolation guarantee. AFT supports new protocols to guarantee read atomic isolation in the serverless setting. We demonstrate that aft introduces minimal overhead relative to existing storage engines and scales smoothly to thousands of requests per second, while preventing a significant number of consistency anomalies.

Distributed Parallel and Cluster Computing Databases

Function Delivery Network: Extending Serverless Computing for Heterogeneous Platforms

232 - Anshul Jindal , Michael Gerndt , Mohak Chadha 2021

Serverless computing has rapidly grown following the launch of Amazons Lambda platform. Function-as-a-Service (FaaS) a key enabler of serverless computing allows an application to be decomposed into simple, standalone functions that are executed on a FaaS platform. The FaaS platform is responsible for deploying and facilitating resources to the functions. Several of todays cloud applications spread over heterogeneous connected computing resources and are highly dynamic in their structure and resource requirements. However, FaaS platforms are limited to homogeneous clusters and homogeneous functions and do not account for the data access behavior of functions before scheduling. We introduce an extension of FaaS to heterogeneous clusters and to support heterogeneous functions through a network of distributed heterogeneous target platforms called Function Delivery Network (FDN). A target platform is a combination of a cluster of homogeneous nodes and a FaaS platform on top of it. FDN provides Function-Delivery-as-a-Service (FDaaS), delivering the function to the right target platform. We showcase the opportunities such as varied target platforms characteristics, possibility of collaborative execution between multiple target platforms, and localization of data that the FDN offers in fulfilling two objectives: Service Level Objective (SLO) requirements and energy efficiency when scheduling functions by evaluating over five distributed target platforms using the FDNInspector, a tool developed by us for benchmarking distributed target platforms. Scheduling functions on an edge target platform in our evaluation reduced the overall energy consumption by 17x without violating the SLO requirements in comparison to scheduling on a high-end target platform.

Distributed Parallel and Cluster Computing Performance

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Harvesting Idle Resources in Serverless Computing via Reinforcement Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions