AIPerf: Automated machine learning as an AI-HPC benchmark

72 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhixiang Ren

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhixiang Ren - Yongheng Liu - Tianhui Shi

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems emerges rapidly. The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload. The current popular AI benchmarks like MLPerf have fixed problem size therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning (AutoML), which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize operations per second (OPS), which is measured in an analytical and systematic approach, as the major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmarks stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured), up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With flexible workload and single metric, our benchmark can scale and rank AI-HPC easily.

قيم البحث

57 - J. Luc Peterson , Ben Bay , Joe Koning 2019

With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogen eous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design, we describe some example applications that Merlin has enabled on leadership-class HPC resources, such as the ML-augmented optimization of nuclear fusion experiments and the calibration of infectious disease models to study the progression of and possible mitigation strategies for COVID-19.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي الفيزياء الحسابية

RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning

91 - Di Zhang , Dong Dai , Youbiao He 2019

Today high-performance computing (HPC) platforms are still dominated by batch jobs. Accordingly, effective batch job scheduling is crucial to obtain high system efficiency. Existing HPC batch job schedulers typically leverage heuristic priority funct ions to prioritize and schedule jobs. But, once configured and deployed by the experts, such priority functions can hardly adapt to the changes of job loads, optimization goals, or system settings, potentially leading to degraded system efficiency when changes occur. To address this fundamental issue, we present RLScheduler, an automated HPC batch job scheduler built on reinforcement learning. RLScheduler relies on minimal manual interventions or expert knowledge, but can learn high-quality scheduling policies via its own continuous trial and error. We introduce a new kernel-based neural network structure and trajectory filtering mechanism in RLScheduler to improve and stabilize the learning process. Through extensive evaluations, we confirm that RLScheduler can learn high-quality scheduling policies towards various workloads and various optimization goals with relatively low computation cost. Moreover, we show that the learned models perform stably even when applied to unseen workloads, making them practical for production use.

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي

Scalable HPC and AI Infrastructure for COVID-19 Therapeutics

321 - Hyungro Lee , Andre Merzky , Li Tan 2020

COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the Nat ional Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics development. We discuss innovations in computational infrastructure and methods that are accelerating and advancing drug design. Specifically, we describe several methods that integrate artificial intelligence and simulation-based approaches, and the design of computational infrastructure to support these methods at scale. We discuss their implementation and characterize their performance, and highlight science advances that these capabilities have enabled.

النظم الموزعة والتوازية والحوسبة العنقودية الهندسة الحاسوبية، المالية،العلوم

On Scale-out Deep Learning Training for Cloud and HPC

85 - Srinivas Sridharan , Karthikeyan Vaidyanathan , Dhiraj Kalamkar 2018

The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/ card cannot satisfy the compute, memory, and I/O requirements of todays state-of-the-art deep neural networks. However, scaling synchronous Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development. This entails innovations spanning algorithms, frameworks, communication libraries, and system design. In this paper, we describe the philosophy, design, and implementation of Intel Machine Learning Scalability Library (MLSL) and present proof-points demonstrating scaling DL training on 100s to 1000s of nodes across Cloud and HPC systems.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Byzantine-Tolerant Machine Learning

83 - Peva Blanchard , El Mahdi El Mhamdi , Rachid Guerraoui 2017

The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitra ry (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of $n$ workers, up to $f$ of them being Byzantine, we ask how robust can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient descent update rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the update rule capturing the basic requirements to guarantee convergence despite $f$ Byzantine workers. We finally propose Krum, an update rule that satisfies the resilience property aforementioned. For a $d$-dimensional learning problem, the time complexity of Krum is $O(n^2 cdot (d + log n))$.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي الحوسبة العصبية والتطورية