Deep Reinforcement Learning-based Methods for Resource Scheduling in Cloud Computing: A Review and Future Directions

110 0 0.0 ( 0 )

Download Cite

Added by Guangyao Zhou

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Guangyao Zhou - Wenhong Tian - Rajkumar Buyya

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

As the quantity and complexity of information processed by software systems increase, large-scale software systems have an increasing requirement for high-performance distributed computing systems. With the acceleration of the Internet in Web 2.0, Cloud computing as a paradigm to provide dynamic, uncertain and elastic services has shown superiorities to meet the computing needs dynamically. Without an appropriate scheduling approach, extensive Cloud computing may cause high energy consumptions and high cost, in addition that high energy consumption will cause massive carbon dioxide emissions. Moreover, inappropriate scheduling will reduce the service life of physical devices as well as increase response time to users request. Hence, efficient scheduling of resource or optimal allocation of request, that usually a NP-hard problem, is one of the prominent issues in emerging trends of Cloud computing. Focusing on improving quality of service (QoS), reducing cost and abating contamination, researchers have conducted extensive work on resource scheduling problems of Cloud computing over years. Nevertheless, growing complexity of Cloud computing, that the super-massive distributed system, is limiting the application of scheduling approaches. Machine learning, a utility method to tackle problems in complex scenes, is used to resolve the resource scheduling of Cloud computing as an innovative idea in recent years. Deep reinforcement learning (DRL), a combination of deep learning (DL) and reinforcement learning (RL), is one branch of the machine learning and has a considerable prospect in resource scheduling of Cloud computing. This paper surveys the methods of resource scheduling with focus on DRL-based scheduling approaches in Cloud computing, also reviews the application of DRL as well as discusses challenges and future directions of DRL in scheduling of Cloud computing.

rate research

Machine Learning (ML)-Centric Resource Management in Cloud Computing: A Review and Future Directions

228 - Tahseen Khan , Wenhong Tian , Rajkumar Buyya 2021

Cloud computing has rapidly emerged as model for delivering Internet-based utility computing services. In cloud computing, Infrastructure as a Service (IaaS) is one of the most important and rapidly growing fields. Cloud providers provide users/machines resources such as virtual machines, raw (block) storage, firewalls, load balancers, and network devices in this service model. One of the most important aspects of cloud computing for IaaS is resource management. Scalability, quality of service, optimum utility, reduced overheads, increased throughput, reduced latency, specialised environment, cost effectiveness, and a streamlined interface are some of the advantages of resource management for IaaS in cloud computing. Traditionally, resource management has been done through static policies, which impose certain limitations in various dynamic scenarios, prompting cloud service providers to adopt data-driven, machine-learning-based approaches. Machine learning is being used to handle a variety of resource management tasks, including workload estimation, task scheduling, VM consolidation, resource optimization, and energy optimization, among others. This paper provides a detailed review of challenges in ML-based resource management in current research, as well as current approaches to resolve these challenges, as well as their advantages and limitations. Finally, we propose potential future research directions based on identified challenges and limitations in current research.

Distributed Parallel and Cluster Computing Machine Learning

A Holistic View on Resource Management in Serverless Computing Environments: Taxonomy and Future Directions

103 - Anupama Mampage , Shanika Karunasekera , Rajkumar Buyya 2021

Serverless computing has emerged as an attractive deployment option for cloud applications in recent times. The unique features of this computing model include, rapid auto-scaling, strong isolation, fine-grained billing options and access to a massive service ecosystem which autonomously handles resource management decisions. This model is increasingly being explored for deployments in geographically distributed edge and fog computing networks as well, due to these characteristics. Effective management of computing resources has always gained a lot of attention among researchers. The need to automate the entire process of resource provisioning, allocation, scheduling, monitoring and scaling, has resulted in the need for specialized focus on resource management under the serverless model. In this article, we identify the major aspects covering the broader concept of resource management in serverless environments and propose a taxonomy of elements which influence these aspects, encompassing characteristics of system design, workload attributes and stakeholder expectations. We take a holistic view on serverless environments deployed across edge, fog and cloud computing networks. We also analyse existing works discussing aspects of serverless resource management using this taxonomy. This article further identifies gaps in literature and highlights future research directions for improving capabilities of this computing model.

Distributed Parallel and Cluster Computing

H2O-Cloud: A Resource and Quality of Service-Aware Task Scheduling Framework for Warehouse-Scale Data Centers -- A Hierarchical Hybrid DRL (Deep Reinforcement Learning) based Approach

89 - Mingxi Cheng , Ji Li , Paul Bogdan 2019

Cloud computing has attracted both end-users and Cloud Service Providers (CSPs) in recent years. Improving resource utilization rate (RUtR), such as CPU and memory usages on servers, while maintaining Quality-of-Service (QoS) is one key challenge faced by CSPs with warehouse-scale data centers. Prior works proposed various algorithms to reduce energy cost or to improve RUtR, which either lack the fine-grained task scheduling capabilities, or fail to take a comprehensive system model into consideration. This article presents H2O-Cloud, a Hierarchical and Hybrid Online task scheduling framework for warehouse-scale CSPs, to improve resource usage effectiveness while maintaining QoS. H2O-Cloud is highly scalable and considers comprehensive information such as various workload scenarios, cloud platform configurations, user request information and dynamic pricing model. The hierarchy and hybridity of the framework, combined with its deep reinforcement learning (DRL) engines, enable H2O-Cloud to efficiently start on-the-go scheduling and learning in an unpredictable environment without pre-training. Our experiments confirm the high efficiency of the proposed H2O-Cloud when compared to baseline approaches, in terms of energy and cost while maintaining QoS. Compared with a state-of-the-art DRL-based algorithm, H2O-Cloud achieves up to 201.17% energy cost efficiency improvement, 47.88% energy efficiency improvement and 551.76% reward rate improvement.

Distributed Parallel and Cluster Computing

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

370 - Xiaocong Chen , Lina Yao , Julian McAuley 2021

In light of the emergence of deep reinforcement learning (DRL) in recommender systems research and several fruitful results in recent years, this survey aims to provide a timely and comprehensive overview of the recent trends of deep reinforcement learning in recommender systems. We start with the motivation of applying DRL in recommender systems. Then, we provide a taxonomy of current DRL-based recommender systems and a summary of existing methods. We discuss emerging topics and open issues, and provide our perspective on advancing the domain. This survey serves as introductory material for readers from academia and industry into the topic and identifies notable opportunities for further research.

Information Retrieval Artificial Intelligence

A New Approach for Resource Scheduling with Deep Reinforcement Learning

128 - Yufei Ye , Xiaoqin Ren , Jin Wang 2018

With the rapid development of deep learning, deep reinforcement learning (DRL) began to appear in the field of resource scheduling in recent years. Based on the previous research on DRL in the literature, we introduce online resource scheduling algorithm DeepRM2 and the offline resource scheduling algorithm DeepRM_Off. Compared with the state-of-the-art DRL algorithm DeepRM and heuristic algorithms, our proposed algorithms have faster convergence speed and better scheduling efficiency with regarding to average slowdown time, job completion time and rewards.

Artificial Intelligence Machine Learning