DRAPS: Dynamic and Resource-Aware Placement Scheme for Docker Containers in a Heterogeneous Cluster

112 0 0.0 ( 0 )

Download Cite

Added by Ying Mao

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Ying Mao - Jenna Oak - Anthony Pompili

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Virtualization is a promising technology that has facilitated cloud computing to become the next wave of the Internet revolution. Adopted by data centers, millions of applications that are powered by various virtual machines improve the quality of services. Although virtual machines are well-isolated among each other, they suffer from redundant boot volumes and slow provisioning time. To address limitations, containers were born to deploy and run distributed applications without launching entire virtual machines. As a dominant player, Docker is an open-source implementation of container technology. When managing a cluster of Docker containers, the management tool, Swarmkit, does not take the heterogeneities in both physical nodes and virtualized containers into consideration. The heterogeneity lies in the fact that different nodes in the cluster may have various configurations, concerning resource types and availabilities, etc., and the demands generated by services are varied, such as CPU-intensive (e.g. Clustering services) as well as memory-intensive (e.g. Web services). In this paper, we target on investigating the Docker container cluster and developed, DRAPS, a resource-aware placement scheme to boost the system performance in a heterogeneous cluster.

rate research

Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes

93 - Ying Mao , Yuqi Fu , Suwen Gu 2020

Businesses have made increasing adoption and incorporation of cloud technology into internal processes in the last decade. The cloud-based deployment provides on-demand availability without active management. More recently, the concept of cloud-native application has been proposed and represents an invaluable step toward helping organizations develop software faster and update it more frequently to achieve dramatic business outcomes. Cloud-native is an approach to build and run applications that exploit the cloud computing delivery models advantages. It is more about how applications are created and deployed than where. The container-based virtualization technology, such as Docker and Kubernetes, serves as the foundation for cloud-native applications. This paper investigates the performance of two popular computational-intensive applications, big data, and deep learning, in a cloud-native environment. We analyze the system overhead and resource usage for these applications. Through extensive experiments, we show that the completion time reduces by up to 79.4% by changing the default setting and increases by up to 96.7% due to different resource management schemes on two platforms. Additionally, the resource release is delayed by up to 116.7% across different systems. Our work can guide developers, administrators, and researchers to better design and deploy their applications by selecting and configuring a hosting platform.

Distributed Parallel and Cluster Computing Performance

Differentiate Quality of Experience Scheduling for Deep Learning Applications with Docker Containers in the Cloud

479 - Ying Mao , Weifeng Yan , Yun Song 2020

With the prevalence of big-data-driven applications, such as face recognition on smartphones and tailored recommendations from Google Ads, we are on the road to a lifestyle with significantly more intelligence than ever before. For example, Aipoly Vision [1] is an object and color recognizer that helps the blind, visually impaired, and color blind understand their surroundings. At the back end side of their intelligence, various neural networks powered models are running to enable quick responses to users. Supporting those models requires lots of cloud-based computational resources, e.g. CPUs and GPUs. The cloud providers charge their clients by the amount of resources that they occupied. From clients perspective, they have to balance the budget and quality of experiences (e.g. response time). The budget leans on individual business owners and the required Quality of Experience (QoE) depends on usage scenarios of different applications, for instance, an autonomous vehicle requires realtime response, but, unlocking your smartphone can tolerate delays. However, cloud providers fail to offer a QoE based option to their clients. In this paper, we propose DQoES, a differentiate quality of experience scheduler for deep learning applications. DQoES accepts clients specification on targeted QoEs, and dynamically adjust resources to approach their targets. Through extensive, cloud-based experiments, DQoES demonstrates that it can schedule multiple concurrent jobs with respect to various QoEs and achieve up to 8x times more satisfied models compared to the existing system.

Distributed Parallel and Cluster Computing Performance

Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data Intensive Applications

441 - Ioan Raicu , Yong Zhao , Ian Foster 2008

Data intensive applications often involve the analysis of large datasets that require large amounts of compute and storage resources. While dedicated compute and/or storage farms offer good task/data throughput, they suffer low resource utilization problem under varying workloads conditions. If we instead move such data to distributed computing resources, then we incur expensive data transfer cost. In this paper, we propose a data diffusion approach that combines dynamic resource provisioning, on-demand data replication and caching, and data locality-aware scheduling to achieve improved resource efficiency under varying workloads. We define an abstract data diffusion model that takes into consideration the workload characteristics, data accessing cost, application throughput and resource utilization; we validate the model using a real-world large-scale astronomy application. Our results show that data diffusion can increase the performance index by as much as 34X, and improve application response time by over 506X, while achieving near-optimal throughputs and execution times.

Distributed Parallel and Cluster Computing

Service Placement with Provable Guarantees in Heterogeneous Edge Computing Systems

75 - Stephen Pasteris , Shiqiang Wang , Mark Herbster 2019

Mobile edge computing (MEC) is a promising technique for providing low-latency access to services at the network edge. The services are hosted at various types of edge nodes with both computation and communication capabilities. Due to the heterogeneity of edge node characteristics and user locations, the performance of MEC varies depending on where the service is hosted. In this paper, we consider such a heterogeneous MEC system, and focus on the problem of placing multiple services in the system to maximize the total reward. We show that the problem is NP-hard via reduction from the set cover problem, and propose a deterministic approximation algorithm to solve the problem, which has an approximation ratio that is not worse than $left(1-e^{-1}right)/4$. The proposed algorithm is based on two sub-routines that are suitable for small and arbitrarily sized services, respectively. The algorithm is designed using a novel way of partitioning each edge node into multiple slots, where each slot contains one service. The approximation guarantee is obtained via a specialization of the method of conditional expectations, which uses a randomized procedure as an intermediate step. In addition to theoretical guarantees, simulation results also show that the proposed algorithm outperforms other state-of-the-art approaches.

Distributed Parallel and Cluster Computing Optimization and Control

TerraWatt: Sustaining Sustainable Computing of Containers in Containers

73 - Jennifer Switzer , Rob McGuinness , Pat Pannuto 2021

Each day the world inches closer to a climate catastrophe and a sustainability revolution. To avoid the former and achieve the latter we must transform our use of energy. Surprisingly, todays growing problem is that there is too much wind and solar power generation at the wrong times and in the wrong places. We argue for the construction of TerraWatt: a geographically-distributed, large-scale, zero-carbon compute infrastructure using renewable energy and older hardware. Delivering zero-carbon compute for general cloud workloads is challenging due to spatiotemporal power variability. We describe the systems challenges in using intermittent renewable power at scale to fuel such an older, decentralized compute infrastructure.

Distributed Parallel and Cluster Computing Networking and Internet Architecture