No Arabic abstract
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures, and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales which were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multi-center rigid potential models based on Lennard-Jones sites, point charges and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, e.g. for fluids at interfaces, as well as non-equilibrium molecular dynamics simulation of heat and mass transfer.
The last ten years have witnessed fast spreading of massively parallel computing clusters, from leading supercomputing facilities down to the average university computing center. Many companies in the private sector have undergone a similar evolution. In this scenario, the seamless integration of software and middleware libraries is a key ingredient to ensure portability of scientific codes and guarantees them an extended lifetime. In this work, we describe the integration of the ChASE library, a modern parallel eigensolver, into an existing legacy code for the first-principles computation of optical properties of materials via solution of the Bethe-Salpeter equation for the optical polarization function. Our numerical tests show that, as a result of integrating ChASE and parallelizing the reading routine, the code experiences a remarkable speedup and greatly improved scaling behavior on both multi- and many-core architectures. We demonstrate that such a modernized BSE code will, by fully exploiting parallel computing architectures and file systems, enable domain scientists to accurately study complex material systems that were not accessible before.
A parallel implementation of coupled spin-lattice dynamics in the LAMMPS molecular dynamics package is presented. The equations of motion for both spin only and coupled spin-lattice dynamics are first reviewed, including a detailed account of how magneto-mechanical potentials can be used to perform a proper coupling between spin and lattice degrees of freedom. A symplectic numerical integration algorithm is then presented which combines the Suzuki-Trotter decomposition for non-commuting variables and conserves the geometric properties of the equations of motion. The numerical accuracy of the serial implementation was assessed by verifying that it conserves the total energy and the norm of the total magnetization up to second order in the timestep size. Finally, a very general parallel algorithm is proposed that allows large spin-lattice systems to be efficiently simulated on large numbers of processors without degrading its mathematical accuracy. Its correctness as well as scaling efficiency were tested for realistic coupled spin-lattice systems, confirming that the new parallel algorithm is both accurate and efficient.
Simulations of systems with quenched disorder are extremely demanding, suffering from the combined effect of slow relaxation and the need of performing the disorder average. As a consequence, new algorithms, improved implementations, and alternative and even purpose-built hardware are often instrumental for conducting meaningful studies of such systems. The ensuing demands regarding hardware availability and code complexity are substantial and sometimes prohibitive. We demonstrate how with a moderate coding effort leaving the overall structure of the simulation code unaltered as compared to a CPU implementation, very significant speed-ups can be achieved from a parallel code on GPU by mainly exploiting the trivial parallelism of the disorder samples and the near-trivial parallelism of the parallel tempering replicas. A combination of this massively parallel implementation with a careful choice of the temperature protocol for parallel tempering as well as efficient cluster updates allows us to equilibrate comparatively large systems with moderate computational resources.
We provide a detailed description of the Chimera code, a code developed to model core collapse supernovae in multiple spatial dimensions. The core collapse supernova explosion mechanism remains the subject of intense research. Progress to date demonstrates that it involves a complex interplay of neutrino production, transport, and interaction in the stellar core, three-dimensional stellar core fluid dynamics and its associated instabilities, nuclear burning, and the foundational physics of the neutrino-stellar core weak interactions and the equations of state of all stellar core constituents -particularly, the nuclear equation of state associated with nucleons, both free and bound in nuclei. Chimera, by incorporating detailed neutrino transport, realistic neutrino-matter interactions, three-dimensional hydrodynamics, realistic nuclear, leptonic, and photonic equations of state, and a nuclear reaction network, along with other refinements, can be used to study the role of neutrino radiation, hydrodynamic instabilities, and a variety of input physics in the explosion mechanism itself. It can also be used to compute observables such as neutrino signatures, gravitational radiation, and the products of nucleosynthesis associated with core collapse supernovae. The code contains modules for neutrino transport, multidimensional compressible hydrodynamics, nuclear reactions, a variety of neutrino interactions, equations of state, and modules to provide data for post-processing observables such as the products of nucleosynthesis, and gravitational radiation. Chimera is an evolving code, being updated periodically with improved input physics and numerical refinements. We detail here the current version of the code, from which future improvements will stem, which can in turn be described as needed in future publications.
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.