End-to-End Predictions-Based Resource Management Framework for Supercomputer Jobs

120 0 0.0 ( 0 )

Download Cite

Added by Sathish Vadhiyar

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Swetha Hariharan - Prakash Murali - Abhishek Pasari

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource management framework that uses predictions of queue waiting and execution times to minimize response times of user jobs submitted to supercomputer systems. Our method for predicting queue waiting times adaptively chooses a prediction method based on the cluster structure of similar jobs. Our strategy for execution time predictions dynamically learns the impact of load on execution times and uses this to predict a set of execution time ranges for the target job. We have developed two resource management techniques that employ these predictions, one that selects the number of processors for execution and the other that also dynamically changes the job submission time. Using workload simulations of large supercomputer traces, we show large-scale improvements in predictions and reductions in response times over existing techniques and baseline strategies.

rate research

End-to-End Service Level Agreement Specification for IoT Applications

182 - Awatif Alqahtani , Yinhao Li , Pankesh Patel 2018

The Internet of Things (IoT) promises to help solve a wide range of issues that relate to our wellbeing within application domains that include smart cities, healthcare monitoring, and environmental monitoring. IoT is bringing new wireless sensor use cases by taking advantage of the computing power and flexibility provided by Edge and Cloud Computing. However, the software and hardware resources used within such applications must perform correctly and optimally. Especially in applications where a failure of resources can be critical. Service Level Agreements (SLA) where the performance requirements of such applications are defined, need to be specified in a standard way that reflects the end-to-end nature of IoT application domains, accounting for the Quality of Service (QoS) metrics within every layer including the Edge, Network Gateways, and Cloud. In this paper, we propose a conceptual model that captures the key entities of an SLA and their relationships, as a prior step for end-to-end SLA specification and composition. Service level objective (SLO) terms are also considered to express the QoS constraints. Moreover, we propose a new SLA grammar which considers workflow activities and the multi-layered nature of IoT applications. Accordingly, we develop a tool for SLA specification and composition that can be used as a template to generate SLAs in a machine-readable format. We demonstrate the effectiveness of the proposed specification language through a literature survey that includes an SLA language comparison analysis, and via reflecting the user satisfaction results of a usability study.

Distributed Parallel and Cluster Computing

A distributed service for on demand end to end optical circuits

167 - Ramiro Voicu , Iosif Legrand , Harvey Newman 2011

In this paper we present a system for monitoring and controlling dynamic network circuits inside the USLHCNet network. This distributed service system provides in near real-time complete topological information for all the circuits, resource allocation and usage, accounting, detects automatically failures in the links and network equipment, generate alarms and has the functionality to take automatic actions. The system is developed based on the MonALISA framework, which provides a robust monitoring and controlling service oriented architecture, with no single points of failure.

Distributed Parallel and Cluster Computing

Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings

87 - Matthew Wiesner , Adithya Renduchintala , Shinji Watanabe 2018

We explore training attention-based encoder-decoder ASR in low-resource settings. These models perform poorly when trained on small amounts of transcribed speech, in part because they depend on having sufficient target-side text to train the attention and decoder networks. In this paper we address this shortcoming by pretraining our network parameters using only text-based data and transcribed speech from other languages. We analyze the relative contributions of both sources of data. Across 3 test languages, our text-based approach resulted in a 20% average relative improvement over a text-based augmentation technique without pretraining. Using transcribed speech from nearby languages gives a further 20-30% relative reduction in character error rate.

Audio and Speech Processing Computation and Language Sound

AutoTune: Improving End-to-end Performance and Resource Efficiency for Microservice Applications

79 - Michael Alan Chang , Aurojit Panda , Hantao Wang 2021

Most large web-scale applications are now built by composing collections (from a few up to 100s or 1000s) of microservices. Operators need to decide how many resources are allocated to each microservice, and these allocations can have a large impact on application performance. Manually determining allocations that are both cost-efficient and meet performance requirements is challenging, even for experienced operators. In this paper we present AutoTune, an end-to-end tool that automatically minimizes resource utilization while maintaining good application performance.

Performance Distributed Parallel and Cluster Computing

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

91 - Udit Gupta , Samuel Hsia , Vikram Saraph 2020

Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an algorithm and system co-design methodology to custom-design systems for recommendation use cases. Leveraging the insights from the recommendation characterization, a new dynamic scheduler, DeepRecSched, is proposed to maximize latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, recommendation model architectures, and underlying hardware systems. By doing so, system throughput is doubled across the eight industry-representative recommendation models. Finally, design, deployment, and evaluation in at-scale production datacenter shows over 30% latency reduction across a wide variety of recommendation models running on hundreds of machines.

Distributed Parallel and Cluster Computing