ترغب بنشر مسار تعليمي؟ اضغط هنا

Dynamic load balancing with enhanced shared-memory parallelism for particle-in-cell codes

84   0   0.0 ( 0 )
 نشر من قبل Kyle Miller
 تاريخ النشر 2020
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Furthering our understanding of many of todays interesting problems in plasma physics---including plasma based acceleration and magnetic reconnection with pair production due to quantum electrodynamic effects---requires large-scale kinetic simulations using particle-in-cell (PIC) codes. However, these simulations are extremely demanding, requiring that contemporary PIC codes be designed to efficiently use a new fleet of exascale computing architectures. To this end, the key issue of parallel load balance across computational nodes must be addressed. We discuss the implementation of dynamic load balancing by dividing the simulation space into many small, self-contained regions or tiles, along with shared-memory (e.g., OpenMP) parallelism both over many tiles and within single tiles. The load balancing algorithm can be used with three different topologies, including two space-filling curves. We tested this implementation in the code OSIRIS and show low overhead and improved scalability with OpenMP thread number on simulations with both uniform load and severe load imbalance. Compared to other load-balancing techniques, our algorithm gives order-of-magnitude improvement in parallel scalability for simulations with severe load imbalance issues.



قيم البحث

اقرأ أيضاً

In the wake of the intense effort made for the experimental CILEX project, numerical simulation cam- paigns have been carried out in order to finalize the design of the facility and to identify optimal laser and plasma parameters. These simulations b ring, of course, important insight into the fundamental physics at play. As a by-product, they also characterize the quality of our theoretical and numerical models. In this paper, we compare the results given by different codes and point out algorithmic lim- itations both in terms of physical accuracy and computational performances. These limitations are illu- strated in the context of electron laser wakefield acceleration (LWFA). The main limitation we identify in state-of-the-art Particle-In-Cell (PIC) codes is computational load imbalance. We propose an innovative algorithm to deal with this specific issue as well as milestones towards a modern, accurate high-per- formance PIC code for high energy particle acceleration.
A customized finite-difference field solver for the particle-in-cell (PIC) algorithm that provides higher fidelity for wave-particle interactions in intense electromagnetic waves is presented. In many problems of interest, particles with relativistic energies interact with intense electromagnetic fields that have phase velocities near the speed of light. Numerical errors can arise due to (1) dispersion errors in the phase velocity of the wave, (2) the staggering in time between the electric and magnetic fields and between particle velocity and position and (3) errors in the time derivative in the momentum advance. Errors of the first two kinds are analyzed in detail. It is shown that by using field solvers with different $mathbf{k}$-space operators in Faradays and Amperes law, the dispersion errors and magnetic field time-staggering errors in the particle pusher can be simultaneously removed for electromagnetic waves moving primarily in a specific direction. The new algorithm was implemented into OSIRIS by using customized higher-order finite-difference operators. Schemes using the proposed solver in combination with different particle pushers are compared through PIC simulation. It is shown that the use of the new algorithm, together with an analytic particle pusher (assuming constant fields over a time step), can lead to accurate modeling of the motion of a single electron in an intense laser field with normalized vector potentials, $eA/mc^2$, exceeding $10^4$ for typical cell sizes and time steps.
Based on the previously developed Energy Conserving Semi Implicit Method (ECsim) code, we present its cylindrical implementation, called ECsim-CYL, to be used for axially symmetric problems. The main motivation for the development of the cylindrical version is to greatly improve the computational speed by utilizing cylindrical symmetry. The ECsim-CYL discretizes the field equations in two-dimensional cylindrical coordinates using the finite volume method . For the particle mover, it uses a modification of ECsims mover for cylindrical coordinates by keeping track of all three components of velocity vectors, while only keeping radial and axial coordinates of particle positions. In this paper, we describe the details of the algorithm used in the ECsim-CYL and present a series of tests to validate the accuracy of the code including a wave spectrum in a homogeneous plasmas inside a cylindrical waveguide and free expansion of a spherical plasma ball in vacuum. The ECsim-CYL retains the stability properties of ECsim and conserves the energy within machine precision, while accurately describing the plasma behavior in the test cases.
Equation systems resulting from a p-version FEM discretisation typically require a special treatment as iterative solvers are not very efficient here. Applying hierarchical concepts based on a nested dissection approach allow for both the design of s ophisticated solvers as well as for advanced parallelisation strategies. To fully exploit the underlying computing power of parallel systems, dynamic load balancing strategies become an essential component.
High-level applications, such as machine learning, are evolving from simple models based on multilayer perceptrons for simple image recognition to much deeper and more complex neural networks for self-driving vehicle control systems.The rapid increas e in the consumption of memory and computational resources by these models demands the use of multi-core parallel systems to scale the execution of the complex emerging applications that depend on them. However, parallel programs running on high-performance computers often suffer from data communication bottlenecks, limited memory bandwidth, and synchronization overhead due to irregular critical sections. In this paper, we propose a framework to reduce the data communication and improve the scalability and performance of these applications in multi-core systems. We design a vertex cut framework for partitioning LLVM IR graphs into clusters while taking into consideration the data communication and workload balance among clusters. First, we construct LLVM graphs by compiling high-level programs into LLVM IR, instrumenting code to obtain the execution order of basic blocks and the execution time for each memory operation, and analyze data dependencies in dynamic LLVM traces. Next, we formulate the problem as Weight Balanced $p$-way Vertex Cut, and propose a generic and flexible framework, wherein four different greedy algorithms are proposed for solving this problem. Lastly, we propose a memory-centric run-time mapping of the linear time complexity to map clusters generated from the vertex cut algorithms onto a multi-core platform. We conclude that our best algorithm, WB-Libra, provides performance improvements of 1.56x and 1.86x over existing state-of-the-art approaches for 8 and 1024 clusters running on a multi-core platform, respectively.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا