Online Adaptive Learning for Runtime Resource Management of Heterogeneous SoCs

117 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sumit Mandal

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sumit K. Mandal - Umit Y. Ogras - Janardhan Rao Doppa

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Dynamic resource management has become one of the major areas of research in modern computer and communication system design due to lower power consumption and higher performance demands. The number of integrated cores, level of heterogeneity and amount of control knobs increase steadily. As a result, the system complexity is increasing faster than our ability to optimize and dynamically manage the resources. Moreover, offline approaches are sub-optimal due to workload variations and large volume of new applications unknown at design time. This paper first reviews recent online learning techniques for predicting system performance, power, and temperature. Then, we describe the use of predictive models for online control using two modern approaches: imitation learning (IL) and an explicit nonlinear model predictive control (NMPC). Evaluations on a commercial mobile platform with 16 benchmarks show that the IL approach successfully adapts the control policy to unknown applications. The explicit NMPC provides 25% energy savings compared to a state-of-the-art algorithm for multi-variable power management of modern GPU sub-systems.

قيم البحث

171 - Aryan Deshwal , Syrine Belakaria , Ganapati Bhat 2021

Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arms Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multiple objectives such as performance and energy. This paper proposes a novel information-theoretic framework referred to as PaRMIS to create Pareto-optimal resource management policies for given target applications and design objectives. PaRMIS specifies parametric policies to manage resources and learns statistical models from candidate policy evaluation data in the form of target design objective values. The key idea is to select a candidate policy for evaluation in each iteration guided by statistical models that maximize the information gain about the true Pareto front. Experiments on a commercial heterogeneous SoC show that PaRMIS achieves better Pareto fronts and is easily usable to optimize complex objectives (e.g., performance per Watt) when compared to prior methods.

هندسة العتاد النظم الموزعة والتوازية والحوسبة العنقودية أنظمة وتحكم

OL4EL: Online Learning for Edge-cloud Collaborative Learning on Heterogeneous Edges with Resource Constraints

131 - Qing Han , Shusen Yang , Xuebin Ren 2020

Distributed machine learning (ML) at network edge is a promising paradigm that can preserve both network bandwidth and privacy of data providers. However, heterogeneous and limited computation and communication resources on edge servers (or edges) po se great challenges on distributed ML and formulate a new paradigm of Edge Learning (i.e. edge-cloud collaborative machine learning). In this article, we propose a novel framework of learning to learn for effective Edge Learning (EL) on heterogeneous edges with resource constraints. We first model the dynamic determination of collaboration strategy (i.e. the allocation of local iterations at edge servers and global aggregations on the Cloud during collaborative learning process) as an online optimization problem to achieve the tradeoff between the performance of EL and the resource consumption of edge servers. Then, we propose an Online Learning for EL (OL4EL) framework based on the budget-limited multi-armed bandit model. OL4EL supports both synchronous and asynchronous learning patterns, and can be used for both supervised and unsupervised learning tasks. To evaluate the performance of OL4EL, we conducted both real-world testbed experiments and extensive simulations based on docker containers, where both Support Vector Machine and K-means were considered as use cases. Experimental results demonstrate that OL4EL significantly outperforms state-of-the-art EL and other collaborative ML approaches in terms of the trade-off between learning performance and resource consumption.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي التعلم الالي

Intelligent Replication Management for HDFS Using Reinforcement Learning

104 - Hyunsung Lee 2020

Storage systems for cloud computing merge a large number of commodity computers into a single large storage pool. It provides high-performance storage over an unreliable, and dynamic network at a lower cost than purchasing and maintaining large mainf rame. In this paper, we examine whether it is feasible to apply Reinforcement Learning(RL) to system domain problems. Our experiments show that the RL model is comparable, even outperform other heuristics for block management problem. However, our experiments are limited in terms of scalability and fidelity. Even though our formulation is not very practical,applying Reinforcement Learning to system domain could offer good alternatives to existing heuristics.

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي

REOH: Using Probabilistic Network for Runtime Energy Optimization of Heterogeneous Systems

64 - Vi Ngoc-Nha Tran , Tommy Oines , Alexander Horsch 2018

Significant efforts have been devoted to choosing the best configuration of a computing system to run an application energy efficiently. However, available tuning approaches mainly focus on homogeneous systems and are inextensible for heterogeneous s ystems which include several components (e.g., CPUs, GPUs) with different architectures. This study proposes a holistic tuning approach called REOH using probabilistic network to predict the most energy-efficient configuration (i.e., which platform and its setting) of a heterogeneous system for running a given application. Based on the computation and communication patterns from Berkeley dwarfs, we conduct experiments to devise the training set including 7074 data samples covering varying application patterns and characteristics. Validating the REOH approach on heterogeneous systems including CPUs and GPUs shows that the energy consumption by the REOH approach is close to the optimal energy consumption by the Brute Force approach while saving 17% of sampling runs compared to the previous (homogeneous) approach using probabilistic network. Based on the REOH approach, we develop an open-source energy-optimizing runtime framework for selecting an energy efficient configuration of a heterogeneous system for a given application at runtime.

النظم الموزعة والتوازية والحوسبة العنقودية

Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters

94 - Zhengda Bian , Shenggui Li , Wei Wang 2021

Efficient GPU resource scheduling is essential to maximize resource utilization and save training costs for the increasing amount of deep learning workloads in shared GPU clusters. Existing GPU schedulers largely rely on static policies to leverage t he performance characteristics of deep learning jobs. However, they can hardly reach optimal efficiency due to the lack of elasticity. To address the problem, we propose ONES, an ONline Evolutionary Scheduler for elastic batch size orchestration. ONES automatically manages the elasticity of each job based on the training batch size, so as to maximize GPU utilization and improve scheduling efficiency. It determines the batch size for each job through an online evolutionary search that can continuously optimize the scheduling decisions. We evaluate the effectiveness of ONES with 64 GPUs on TACCs Longhorn supercomputers. The results show that ONES can outperform the prior deep learning schedulers with a significantly shorter average job completion time.

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي