ترغب بنشر مسار تعليمي؟ اضغط هنا

CUBE -- Towards an Optimal Scaling of Cosmological N-body Simulations

72   0   0.0 ( 0 )
 نشر من قبل Hao-Ran Yu
 تاريخ النشر 2020
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

N-body simulations are essential tools in physical cosmology to understand the large-scale structure (LSS) formation of the Universe. Large-scale simulations with high resolution are important for exploring the substructure of universe and for determining fundamental physical parameters like neutrino mass. However, traditional particle-mesh (PM) based algorithms use considerable amounts of memory, which limits the scalability of simulations. Therefore, we designed a two-level PM algorithm CUBE towards optimal performance in memory consumption reduction. By using the fixed-point compression technique, CUBE reduces the memory consumption per N-body particle toward 6 bytes, an order of magnitude lower than the traditional PM-based algorithms. We scaled CUBE to 512 nodes (20,480 cores) on an Intel Cascade Lake based supercomputer with $simeq$95% weak-scaling efficiency. This scaling test was performed in Cosmo-$pi$ -- a cosmological LSS simulation using $simeq$4.4 trillion particles, tracing the evolution of the universe over $simeq$13.7 billion years. To our best knowledge, Cosmo-$pi$ is the largest completed cosmological N-body simulation. We believe CUBE has a huge potential to scale on exascale supercomputers for larger simulations.



قيم البحث

اقرأ أيضاً

294 - Hao-Ran Yu , Ue-Li Pen , Xin Wang 2017
Cosmological large scale structure $N$-body simulations are computation-light, memory-heavy problems in supercomputing. The considerable amount of memory is usually dominated by an inefficient way of storing more than sufficient phase space informati on of particles. We present a new parallel, information-optimized, particle-mesh-based $N$-body code CUBE, in which information-efficiency and memory-efficiency are increased by nearly an order of magnitude. This is accomplished by storing particles relative phase space coordinates instead of global values, and in the format of fixed point as light as 1 byte. The remaining information is given by complementary density and velocity fields (negligible in memory space) and proper ordering of particles (no extra memory). Our numerical experiments show that this information-optimized $N$-body algorithm provides accurate results within the error of the particle-mesh algorithm. This significant lowering of the memory-to-computation ratio breaks the bottleneck of scaling up and speeding up large cosmological $N$-body simulations on multi-core and heterogeneous computing systems.
Gravitational softening length is one of the key parameters to properly set up a cosmological $N$-body simulation. In this paper, we perform a large suit of high-resolution $N$-body simulations to revise the optimal softening scheme proposed by Power et al. (P03). Our finding is that P03 optimal scheme works well but is over conservative. Using smaller softening lengths than that of P03 can achieve higher spatial resolution and numerically convergent results on both circular velocity and density profiles. However using an over small softening length overpredicts matter density at the inner most region of dark matter haloes. We empirically explore a better optimal softening scheme based on P03 form and find that a small modification works well. This work will be useful for setting up cosmological simulations.
We use gauge-invariant cosmological perturbation theory to calculate the displacement field that sets the initial conditions for $N$-body simulations. Using first and second-order fully relativistic perturbation theory in the synchronous-comoving gau ge, allows us to go beyond the Newtonian predictions and to calculate relativistic corrections to it. We use an Einstein--de Sitter model, including both growing and decaying modes in our solutions. The impact of our results should be assessed through the implementation of the featured displacement in cosmological $N$-body simulations.
(Abridged) We use high resolution cosmological N-body simulations to study the growth of intermediate to supermassive black holes from redshift 49 to zero. We track the growth of black holes from the seeds of population III stars to black holes in th e range of 10^3 < M < 10^7 Msun -- not quasars, but rather IMBH to low-mass SMBHs. These lower mass black holes are the primary observable for the Laser Interferometer Space Antenna (LISA). The large-scale dynamics of the black holes are followed accurately within the simulation down to scales of 1 kpc; thereafter, we follow the merger analytically from the last dynamical friction phase to black hole coalescence. We find that the merger rate of these black holes is R~25 per year between 8 < z < 11 and R = 10 per year at z=3. Before the merger occurs the incoming IMBH may be observed with a next generation of X-ray telescopes as a ULX source with a rate of about ~ 3 - 7 per year for 1 < z < 5. We develop an analytic prescription that captures the most important black hole growth mechanisms: galaxy merger-driven gas accretion and black hole coalescence. Using this, we find that SMBH at the center of Milky Way type galaxy was in place with most of its mass by z = 4.7, and most of the growth was driven by gas accretion excited by major mergers. Hundreds of black holes have failed to coalesce with the SMBH by z=0, some with masses of 10000 Msun, orbiting within the dark matter halo with luminosities up to ~ 30000 Lsun. These X-ray sources can easily be observed with Chandra at ~ 100 kpc.
225 - Tom Theuns 2015
Simulations of galaxy formation follow the gravitational and hydrodynamical interactions between gas, stars and dark matter through cosmic time. The huge dynamic range of such calculations severely limits strong scaling behaviour of the community cod es in use, with load-imbalance, cache inefficiencies and poor vectorisation limiting performance. The new swift code exploits task-based parallelism designed for many-core compute nodes interacting via MPI using asynchronous communication to improve speed and scaling. A graph-based domain decomposition schedules interdependent tasks over available resources. Strong scaling tests on realistic particle distributions yield excellent parallel efficiency, and efficient cache usage provides a large speed-up compared to current codes even on a single core. SWIFT is designed to be easy to use by shielding the astronomer from computational details such as the construction of the tasks or MPI communication. The techniques and algorithms used in SWIFT may benefit other computational physics areas as well, for example that of compressible hydrodynamics. For details of this open-source project, see www.swiftsim.com
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا