Subscribe to the gold package and get unlimited access to Shamra Academy

Dynamic load balancing with enhanced shared-memory parallelism for particle-in-cell codes

84 0 0.0 ( 0 )

Download Cite

Added by Kyle Miller

Publication date 2020

fields Physics

and research's language is English

Authors Kyle G. Miller - Roman P. Lee - Adam Tableman

Computational Physics Plasma Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Furthering our understanding of many of todays interesting problems in plasma physics---including plasma based acceleration and magnetic reconnection with pair production due to quantum electrodynamic effects---requires large-scale kinetic simulations using particle-in-cell (PIC) codes. However, these simulations are extremely demanding, requiring that contemporary PIC codes be designed to efficiently use a new fleet of exascale computing architectures. To this end, the key issue of parallel load balance across computational nodes must be addressed. We discuss the implementation of dynamic load balancing by dividing the simulation space into many small, self-contained regions or tiles, along with shared-memory (e.g., OpenMP) parallelism both over many tiles and within single tiles. The load balancing algorithm can be used with three different topologies, including two space-filling curves. We tested this implementation in the code OSIRIS and show low overhead and improved scalability with OpenMP thread number on simulations with both uniform load and severe load imbalance. Compared to other load-balancing techniques, our algorithm gives order-of-magnitude improvement in parallel scalability for simulations with severe load imbalance issues.

rate research

Load management strategy for Particle-In-Cell simulations in high energy particle acceleration

108 - Arnaud Beck , Jacob Trier Frederiksen , Julien Derouillat 2015

In the wake of the intense effort made for the experimental CILEX project, numerical simulation cam- paigns have been carried out in order to finalize the design of the facility and to identify optimal laser and plasma parameters. These simulations bring, of course, important insight into the fundamental physics at play. As a by-product, they also characterize the quality of our theoretical and numerical models. In this paper, we compare the results given by different codes and point out algorithmic lim- itations both in terms of physical accuracy and computational performances. These limitations are illu- strated in the context of electron laser wakefield acceleration (LWFA). The main limitation we identify in state-of-the-art Particle-In-Cell (PIC) codes is computational load imbalance. We propose an innovative algorithm to deal with this specific issue as well as milestones towards a modern, accurate high-per- formance PIC code for high energy particle acceleration.

Computational Physics

A new field solver for modeling of relativistic particle-laser interactions using the particle-in-cell algorithm

71 - Fei Li , Kyle G. Miller , Xinlu Xu 2020

A customized finite-difference field solver for the particle-in-cell (PIC) algorithm that provides higher fidelity for wave-particle interactions in intense electromagnetic waves is presented. In many problems of interest, particles with relativistic energies interact with intense electromagnetic fields that have phase velocities near the speed of light. Numerical errors can arise due to (1) dispersion errors in the phase velocity of the wave, (2) the staggering in time between the electric and magnetic fields and between particle velocity and position and (3) errors in the time derivative in the momentum advance. Errors of the first two kinds are analyzed in detail. It is shown that by using field solvers with different $mathbf{k}$-space operators in Faradays and Amperes law, the dispersion errors and magnetic field time-staggering errors in the particle pusher can be simultaneously removed for electromagnetic waves moving primarily in a specific direction. The new algorithm was implemented into OSIRIS by using customized higher-order finite-difference operators. Schemes using the proposed solver in combination with different particle pushers are compared through PIC simulation. It is shown that the use of the new algorithm, together with an analytic particle pusher (assuming constant fields over a time step), can lead to accurate modeling of the motion of a single electron in an intense laser field with normalized vector potentials, $eA/mc^2$, exceeding $10^4$ for typical cell sizes and time steps.

Computational Physics Plasma Physics

ECsim-CYL: Energy Conserving Semi-Implicit particle in cell simulation in axially symmetric cylindrical coordinates

177 - Diego Gonzalez-Herrero , Alfredo Micera , Elisabetta Boella 2018

Based on the previously developed Energy Conserving Semi Implicit Method (ECsim) code, we present its cylindrical implementation, called ECsim-CYL, to be used for axially symmetric problems. The main motivation for the development of the cylindrical version is to greatly improve the computational speed by utilizing cylindrical symmetry. The ECsim-CYL discretizes the field equations in two-dimensional cylindrical coordinates using the finite volume method . For the particle mover, it uses a modification of ECsims mover for cylindrical coordinates by keeping track of all three components of velocity vectors, while only keeping radial and axial coordinates of particle positions. In this paper, we describe the details of the algorithm used in the ECsim-CYL and present a series of tests to validate the accuracy of the code including a wave spectrum in a homogeneous plasmas inside a cylindrical waveguide and free expansion of a spherical plasma ball in vacuum. The ECsim-CYL retains the stability properties of ECsim and conserves the energy within machine precision, while accurately describing the plasma behavior in the test cases.

Computational Physics Plasma Physics

Dynamic load balancing strategies for hierarchical p-FEM solvers

109 - Ralf-Peter Mundani 2018

Equation systems resulting from a p-version FEM discretisation typically require a special treatment as iterative solvers are not very efficient here. Applying hierarchical concepts based on a nested dissection approach allow for both the design of sophisticated solvers as well as for advanced parallelisation strategies. To fully exploit the underlying computing power of parallel systems, dynamic load balancing strategies become an essential component.

Distributed Parallel and Cluster Computing

A Vertex Cut based Framework for Load Balancing and Parallelism Optimization in Multi-core Systems

88 - Guixiang Ma , Yao Xiao , Theodore L. Willke 2020

High-level applications, such as machine learning, are evolving from simple models based on multilayer perceptrons for simple image recognition to much deeper and more complex neural networks for self-driving vehicle control systems.The rapid increase in the consumption of memory and computational resources by these models demands the use of multi-core parallel systems to scale the execution of the complex emerging applications that depend on them. However, parallel programs running on high-performance computers often suffer from data communication bottlenecks, limited memory bandwidth, and synchronization overhead due to irregular critical sections. In this paper, we propose a framework to reduce the data communication and improve the scalability and performance of these applications in multi-core systems. We design a vertex cut framework for partitioning LLVM IR graphs into clusters while taking into consideration the data communication and workload balance among clusters. First, we construct LLVM graphs by compiling high-level programs into LLVM IR, instrumenting code to obtain the execution order of basic blocks and the execution time for each memory operation, and analyze data dependencies in dynamic LLVM traces. Next, we formulate the problem as Weight Balanced $p$-way Vertex Cut, and propose a generic and flexible framework, wherein four different greedy algorithms are proposed for solving this problem. Lastly, we propose a memory-centric run-time mapping of the linear time complexity to map clusters generated from the vertex cut algorithms onto a multi-core platform. We conclude that our best algorithm, WB-Libra, provides performance improvements of 1.56x and 1.86x over existing state-of-the-art approaches for 8 and 1024 clusters running on a multi-core platform, respectively.

Distributed Parallel and Cluster Computing Machine Learning

comments

Fetching comments

Syrian Virtual University

Additional details More universities

Dynamic load balancing with enhanced shared-memory parallelism for particle-in-cell codes

Ask ChatGPT about the research

No Arabic abstract

Read More