We present an efficient open-source implementation of the multiparticle collision dynamics (MPCD) algorithm that scales to run on hundreds of graphics processing units (GPUs). We especially focus on optimizations for modern GPU architectures and communication patterns between multiple GPUs. We show that a mixed-precision computing model can improve performance compared to a fully double-precision model while still providing good numerical accuracy. We report weak and strong scaling benchmarks of a reference MPCD solvent and a benchmark of a polymer solution with research-relevant interactions and system size. Our MPCD software enables simulations of mesoscale hydrodynamics at length and time scales that would be otherwise challenging or impossible to access.
This paper describes a massively parallel code for a state-of-the art thermal lattice- Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bot- tlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and op- timization methodology that can be used for the development of other high performance applications for computational physics.
A revised version of the massively parallel simulator of a universal quantum computer, described in this journal eleven years ago, is used to benchmark various gate-based quantum algorithms on some of the most powerful supercomputers that exist today. Adaptive encoding of the wave function reduces the memory requirement by a factor of eight, making it possible to simulate universal quantum computers with up to 48 qubits on the Sunway TaihuLight and on the K computer. The simulator exhibits close-to-ideal weak-scaling behavior on the Sunway TaihuLight,on the K computer, on an IBM Blue Gene/Q, and on Intel Xeon based clusters, implying that the combination of parallelization and hardware can track the exponential scaling due to the increasing number of qubits. Results of executing simple quantum circuits and Shors factorization algorithm on quantum computers containing up to 48 qubits are presented.
A fully parallel version of the contact dynamics (CD) method is presented in this paper. For large enough systems, 100% efficiency has been demonstrated for up to 256 processors using a hierarchical domain decomposition with dynamic load balancing. The iterative scheme to calculate the contact forces is left domain-wise sequential, with data exchange after each iteration step, which ensures its stability. The number of additional iterations required for convergence by the partially parallel updates at the domain boundaries becomes negligible with increasing number of particles, which allows for an effective parallelization. Compared to the sequential implementation, we found no influence of the parallelization on simulation results.
The special computational challenges of simulating 3-D hydrodynamics in deep stellar interiors are discussed, and numerical algorithmic responses described. Results of recent simulations carried out at scale on the NSFs Blue Waters machine at the University of Illinois are presented, with a special focus on the computational challenges they address. Prospects for future work using GPU-accelerated nodes such as those on the DoEs new Summit machine at Oak Ridge National Laboratory are described, with a focus on numerical algorithmic accommodations that we believe will be necessary.
Particle-in-cell methods couple mesh-based methods for the solution of continuum mechanics problems, with the ability to advect and evolve particles. They have a long history and many applications in scientific computing. However, they have most often only been implemented for either sequential codes, or parallel codes with static meshes that are statically partitioned. In contrast, many mesh-based codes today use adaptively changing, dynamically partitioned meshes, and can scale to thousands or tens of thousands of processors. Consequently, there is a need to revisit the data structures and algorithms necessary to use particle methods with modern, mesh-based methods. Here we review commonly encountered requirements of particle-in-cell methods, and describe efficient ways to implement them in the context of large-scale parallel finite-element codes that use dynamically changing meshes. We also provide practical experience for how to address bottlenecks that impede the efficient implementation of these algorithms and demonstrate with numerical tests both that our algorithms can be implemented with optimal complexity and that they are suitable for very large-scale, practical applications. We provide a reference implementation in ASPECT, an open source code for geodynamic mantle-convection simulations built on the deal.II library.
Michael P. Howard
,Athanassios Z. Panagiotopoulos
,Arash Nikoubashman
.
(2018)
.
"Efficient mesoscale hydrodynamics: multiparticle collision dynamics with massively parallel GPU acceleration"
.
Arash Nikoubashman
هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا