Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

In-Situ Assessment of Device-Side Compute Work for Dynamic Load Balancing in a GPU-Accelerated PIC Code

222 0 0.0 ( 0 )

Download Cite

Added by Michael Rowan

Publication date 2021

fields Informatics Engineering Physics

and research's language is English

Authors Michael E. Rowan - Axel Huebl - Kevin N. Gott

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parallelism, we investigate the performance of the improved dynamic load balancing via a strong scaling-based performance model and show that, for a laser-ion acceleration test problem run with up to 6144 GPUs on Summit, the enhanced dynamic load balancing achieves from 62%--74% (88% when running on 6 GPUs) of the theoretically predicted maximum speedup; for the 96-GPU case, we find that dynamic load balancing improves performance relative to baselines without load balancing (3.8x speedup) and with static load balancing (1.2x speedup). Our results provide important insights into dynamic load balancing and performance assessment, and are particularly relevant in the context of distributed memory applications ran on GPUs.

rate research

On the Complexity of Load Balancing in Dynamic Networks

83 - Seth Gilbert , Uri Meir , Ami Paz 2021

In the load balancing problem, each node in a network is assigned a load, and the goal is to equally distribute the loads among the nodes, by preforming local load exchanges. While load balancing was extensively studied in static networks, only recently a load balancing algorithm for dynamic networks with a bounded convergence time was presented. In this paper, we further study the time complexity of load balancing in the context of dynamic networks. First, we show that randomness is not necessary, and present a deterministic algorithm which slightly improves the running time of the previous algorithm, at the price of not being matching-based. Then, we consider integral loads, i.e., loads that cannot be split indefinitely, and prove that no matching-based algorithm can have a bounded convergence time for this case. To circumvent both this impossibility result, and a known one for the non-integral case, we apply the method of smoothed analysis, where random perturbations are made over the worst-case choices of network topologies. We show both impossibility results do not hold under this kind of analysis, suggesting that load-balancing in real world systems might be faster than the lower bounds suggest.

Distributed Parallel and Cluster Computing Networking and Internet Architecture

Load balancing policies with server-side cancellation of replicas

146 - Rooji Jinan , Ajay Badita , Tejas Bodas 2020

Popular dispatching policies such as the join shortest queue (JSQ), join smallest work (JSW) and their power of two variants are used in load balancing systems where the instantaneous queue length or workload information at all queues or a subset of them can be queried. In situations where the dispatcher has an associated memory, one can minimize this query overhead by maintaining a list of idle servers to which jobs can be dispatched. Recent alternative approaches that do not require querying such information include the cancel on start and cancel on complete based replication policies. The downside of such policies however is that the servers must communicate the start or completion of each service to the dispatcher and must allow cancellation of redundant copies. In this work, we consider a load balancing environment where the dispatcher cannot query load information, does not have a memory, and cannot cancel any replica that it may have created. In such a rigid environment, we allow the dispatcher to possibly append a server side cancellation criteria to each job or its replica. A job or a replica is served only if it satisfies the predefined criteria at the time of service. We focus on a criteria that is based on the waiting time experienced by a job or its replica and analyze several variants of this policy based on the assumption of asymptotic independence of queues. The proposed policies are novel and perform remarkably well in spite of the rigid operating constraints.

Distributed Parallel and Cluster Computing

Dynamic load balancing strategies for hierarchical p-FEM solvers

109 - Ralf-Peter Mundani 2018

Equation systems resulting from a p-version FEM discretisation typically require a special treatment as iterative solvers are not very efficient here. Applying hierarchical concepts based on a nested dissection approach allow for both the design of sophisticated solvers as well as for advanced parallelisation strategies. To fully exploit the underlying computing power of parallel systems, dynamic load balancing strategies become an essential component.

Distributed Parallel and Cluster Computing

Load Balancing in a Networked Environment through Homogenization

386 - M. Shahriar Hossain , M. Muztaba Fuad , Debzani Deb 2011

Distributed processing across a networked environment suffers from unpredictable behavior of speedup due to heterogeneous nature of the hardware and software in the remote machines. It is challenging to get a better performance from a distributed system by distributing task in an intelligent manner such that the heterogeneous nature of the system do not have any effect on the speedup ratio. This paper introduces homogenization, a technique that distributes and balances the workload in such a manner that the user gets the highest speedup possible from a distributed environment. Along with providing better performance, homogenization is totally transparent to the user and requires no interaction with the system.

Distributed Parallel and Cluster Computing

A Survey on Load Balancing Algorithms for VM Placement in Cloud Computing

145 - Minxian Xu , Wenhong Tian , Rajkumar Buyya 2016

The emergence of cloud computing based on virtualization technologies brings huge opportunities to host virtual resource at low cost without the need of owning any infrastructure. Virtualization technologies enable users to acquire, configure and be charged on pay-per-use basis. However, Cloud data centers mostly comprise heterogeneous commodity servers hosting multiple virtual machines (VMs) with potential various specifications and fluctuating resource usages, which may cause imbalanced resource utilization within servers that may lead to performance degradation and service level agreements (SLAs) violations. To achieve efficient scheduling, these challenges should be addressed and solved by using load balancing strategies, which have been proved to be NP-hard problem. From multiple perspectives, this work identifies the challenges and analyzes existing algorithms for allocating VMs to PMs in infrastructure Clouds, especially focuses on load balancing. A detailed classification targeting load balancing algorithms for VM placement in cloud data centers is investigated and the surveyed algorithms are classified according to the classification. The goal of this paper is to provide a comprehensive and comparative understanding of existing literature and aid researchers by providing an insight for potential future enhancements.

Distributed Parallel and Cluster Computing

comments

Fetching comments

Syrian Virtual University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

In-Situ Assessment of Device-Side Compute Work for Dynamic Load Balancing in a GPU-Accelerated PIC Code

Ask ChatGPT about the research

No Arabic abstract

Read More