Dynamic load balancing strategies for hierarchical p-FEM solvers

110 0 0.0 ( 0 )

Download Cite

Added by Ralf-Peter Mundani

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Ralf-Peter Mundani

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Equation systems resulting from a p-version FEM discretisation typically require a special treatment as iterative solvers are not very efficient here. Applying hierarchical concepts based on a nested dissection approach allow for both the design of sophisticated solvers as well as for advanced parallelisation strategies. To fully exploit the underlying computing power of parallel systems, dynamic load balancing strategies become an essential component.

rate research

On the Complexity of Load Balancing in Dynamic Networks

83 - Seth Gilbert , Uri Meir , Ami Paz 2021

In the load balancing problem, each node in a network is assigned a load, and the goal is to equally distribute the loads among the nodes, by preforming local load exchanges. While load balancing was extensively studied in static networks, only recently a load balancing algorithm for dynamic networks with a bounded convergence time was presented. In this paper, we further study the time complexity of load balancing in the context of dynamic networks. First, we show that randomness is not necessary, and present a deterministic algorithm which slightly improves the running time of the previous algorithm, at the price of not being matching-based. Then, we consider integral loads, i.e., loads that cannot be split indefinitely, and prove that no matching-based algorithm can have a bounded convergence time for this case. To circumvent both this impossibility result, and a known one for the non-integral case, we apply the method of smoothed analysis, where random perturbations are made over the worst-case choices of network topologies. We show both impossibility results do not hold under this kind of analysis, suggesting that load-balancing in real world systems might be faster than the lower bounds suggest.

Distributed Parallel and Cluster Computing Networking and Internet Architecture

Efficient Load-Balancing through Distributed Token Dropping

203 - Sebastian Brandt , Barbara Keller , Joel Rybicki 2020

We introduce a new graph problem, the token dropping game, and we show how to solve it efficiently in a distributed setting. We use the token dropping game as a tool to design an efficient distributed algorithm for stable orientations and more generally for locally optimal semi-matchings. The prior work by Czygrinow et al. (DISC 2012) finds a stable orientation in $O(Delta^5)$ rounds in graphs of maximum degree $Delta$, while we improve it to $O(Delta^4)$ and also prove a lower bound of $Omega(Delta)$.

Distributed Parallel and Cluster Computing

In-Situ Assessment of Device-Side Compute Work for Dynamic Load Balancing in a GPU-Accelerated PIC Code

221 - Michael E. Rowan , Axel Huebl , Kevin N. Gott 2021

Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parallelism, we investigate the performance of the improved dynamic load balancing via a strong scaling-based performance model and show that, for a laser-ion acceleration test problem run with up to 6144 GPUs on Summit, the enhanced dynamic load balancing achieves from 62%--74% (88% when running on 6 GPUs) of the theoretically predicted maximum speedup; for the 96-GPU case, we find that dynamic load balancing improves performance relative to baselines without load balancing (3.8x speedup) and with static load balancing (1.2x speedup). Our results provide important insights into dynamic load balancing and performance assessment, and are particularly relevant in the context of distributed memory applications ran on GPUs.

Distributed Parallel and Cluster Computing Accelerator Physics Computational Physics

Load Balancing in a Networked Environment through Homogenization

136 - M. Shahriar Hossain , M. Muztaba Fuad , Debzani Deb 2011

Distributed processing across a networked environment suffers from unpredictable behavior of speedup due to heterogeneous nature of the hardware and software in the remote machines. It is challenging to get a better performance from a distributed system by distributing task in an intelligent manner such that the heterogeneous nature of the system do not have any effect on the speedup ratio. This paper introduces homogenization, a technique that distributes and balances the workload in such a manner that the user gets the highest speedup possible from a distributed environment. Along with providing better performance, homogenization is totally transparent to the user and requires no interaction with the system.

Distributed Parallel and Cluster Computing

Distributed Selfish Load Balancing with Weights and Speeds

178 - C. P. J. Adolphs , P. Berenbrink 2011

In this paper we consider neighborhood load balancing in the context of selfish clients. We assume that a network of n processors and m tasks is given. The processors may have different speeds and the tasks may have different weights. Every task is controlled by a selfish user. The objective of the user is to allocate his/her task to a processor with minimum load. We revisit the concurrent probabilistic protocol introduced in [6], which works in sequential rounds. In each round every task is allowed to query the load of one randomly chosen neighboring processor. If that load is smaller the task will migrate to that processor with a suitably chosen probability. Using techniques from spectral graph theory we obtain upper bounds on the expected convergence time towards approximate and exact Nash equilibria that are significantly better than the previous results in [6]. We show results for uniform tasks on non-uniform processors and the general case where the tasks have different weights and the machines have speeds. To the best of our knowledge, these are the first results for this general setting.

Distributed Parallel and Cluster Computing Data Structures and Algorithms