H2O-Cloud: A Resource and Quality of Service-Aware Task Scheduling Framework for Warehouse-Scale Data Centers -- A Hierarchical Hybrid DRL (Deep Reinforcement Learning) based Approach


Abstract in English

Cloud computing has attracted both end-users and Cloud Service Providers (CSPs) in recent years. Improving resource utilization rate (RUtR), such as CPU and memory usages on servers, while maintaining Quality-of-Service (QoS) is one key challenge faced by CSPs with warehouse-scale data centers. Prior works proposed various algorithms to reduce energy cost or to improve RUtR, which either lack the fine-grained task scheduling capabilities, or fail to take a comprehensive system model into consideration. This article presents H2O-Cloud, a Hierarchical and Hybrid Online task scheduling framework for warehouse-scale CSPs, to improve resource usage effectiveness while maintaining QoS. H2O-Cloud is highly scalable and considers comprehensive information such as various workload scenarios, cloud platform configurations, user request information and dynamic pricing model. The hierarchy and hybridity of the framework, combined with its deep reinforcement learning (DRL) engines, enable H2O-Cloud to efficiently start on-the-go scheduling and learning in an unpredictable environment without pre-training. Our experiments confirm the high efficiency of the proposed H2O-Cloud when compared to baseline approaches, in terms of energy and cost while maintaining QoS. Compared with a state-of-the-art DRL-based algorithm, H2O-Cloud achieves up to 201.17% energy cost efficiency improvement, 47.88% energy efficiency improvement and 551.76% reward rate improvement.

Download