No Arabic abstract
Direct $N$-body simulations of star clusters are accurate but expensive, largely due to the numerous $mathcal{O} (N^2)$ pairwise force calculations. To solve the post-million-body problem, it will be necessary to use approximate force solvers, such as tree codes. In this work, we adapt a tree-based, optimized Fast Multipole Method (FMM) to the collisional $N$-body problem. The use of a rotation-accelerated translation operator and an error-controlled cell opening criterion leads to a code that can be tuned to arbitrary accuracy. We demonstrate that our code, Taichi, can be as accurate as direct summation when $N> 10^4$. This opens up the possibility of performing large-$N$, star-by-star simulations of massive stellar clusters, and would permit large parameter space studies that would require years with the current generation of direct summation codes. Using a series of tests and idealized models, we show that Taichi can accurately model collisional effects, such as dynamical friction and the core-collapse time of idealized clusters, producing results in strong agreement with benchmarks from other collisional codes such as NBODY6++GPU or PeTar. Parallelized using OpenMP and AVX, Taichi is demonstrated to be more efficient than other CPU-based direct $N$-body codes for simulating large systems. With future improvements to the handling of close encounters and binary evolution, we clearly demonstrate the potential of an optimized FMM for the modeling of collisional stellar systems, opening the door to accurate simulations of massive globular clusters, super star clusters, and even galactic nuclei.
We describe a major upgrade of a Monte Carlo code which has previously been used for many studies of dense star clusters. We outline the steps needed in order to calibrate the results of the new Monte Carlo code against $N$-body simulations for large $N$ systems, up to $N=200000$. The new version of the Monte Carlo code (called MOCCA), in addition to the features of the old version, incorporates the direct Fewbody integrator (Fregeau et al. 2004) for three- and four-body interactions, and a new treatment of the escape process based on Fukushige & Heggie (2000). Now stars which fulfil the escape criterion are not removed immediately, but can stay in the system for a certain time which depends on the excess of the energy of a star above the escape energy. They are called potential escapers. With the addition of the Fewbody integrator the code can follow all interaction channels which are important for the rate of creation of various types of objects observed in star clusters, and ensures that the energy generation by binaries is treated in a manner similar to the $N$-body model. There are at most three new parameters which have to be adjusted against $N$-body simulations for large $N$: two (or one, depending on the chosen approach) connected with the escape process, and one responsible for the determination of the interaction probabilities. The values adopted for the free parameters have at most a weak dependence on $N$. They allow MOCCA to reproduce $N$-body results with reasonable precision, not only for the rate of cluster evolution and the cluster mass distribution, but also for the detailed distributions of mass and binding energy of binaries. Additionally, the code can follow the rate of formation of blue stragglers and black hole - black hole binaries.
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.
The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time-consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries (5%) have been performed by using the NBODY6++GPU code. Such models took half a year computational time on a GPU based super-computer. In this work, we develop a new N-body code, PeTar, by combining the methods of Barnes-Hut tree, Hermite integrator and slow-down algorithmic regularization (SDAR). The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries, triples) while keeping a high performance by using the hybrid parallelization methods with MPI, OpenMP, SIMD instructions and GPU. A few benchmarks indicate that PeTar and NBODY6++GPU have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using PeTar is 11 times faster than that of NBODY6++GPU. Moreover, on the Cray XC50 supercomputer, PeTar well scales when number of cores increase. The ten million-body problem, which covers the region of ultra compact dwarfs and nuclearstar clusters, becomes possible to be solved.
We present Particle-Particle-Particle-Mesh (PPPM) and Tree Particle-Mesh (TreePM) implementations on GRAPE-5 and GRAPE-6A systems, special-purpose hardware accelerators for gravitational many-body simulations. In our PPPM and TreePM implementations on GRAPE, the computational time is significantly reduced compared with the conventional implementations without GRAPE, especially under the strong particle clustering, and almost constant irrespective of the degree of particle clustering. We carry out the survey of two simulation parameters, the PM grid spacing and the opening parameter for the most optimal combination of force accuracy and computational speed. We also describe the parallelization of these implementations on a PC-GRAPE cluster, in which each node has one GRAPE board, and present the optimal configuration of simulation parameters for good parallel scalability.
We present a new symplectic integrator designed for collisional gravitational $N$-body problems which makes use of Kepler solvers. The integrator is also reversible and conserves 9 integrals of motion of the $N$-body problem to machine precision. The integrator is second order, but the order can easily be increased by the method of citeauthor{yos90}. We use fixed time step in all tests studied in this paper to ensure preservation of symplecticity. We study small $N$ collisional problems and perform comparisons with typically used integrators. In particular, we find comparable or better performance when compared to the 4th order Hermite method and much better performance than adaptive time step symplectic integrators introduced previously. We find better performance compared to SAKURA, a non-symplectic, non-time-reversible integrator based on a different two-body decomposition of the $N$-body problem. The integrator is a promising tool in collisional gravitational dynamics.