Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

CUBE: An Information-optimized parallel Cosmological $N$-body Algorithm

295 0 0.0 ( 0 )

Download Cite

Added by Hao-Ran Yu

Publication date 2017

fields Physics

and research's language is English

Authors Hao-Ran Yu - Ue-Li Pen - Xin Wang

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Cosmological large scale structure $N$-body simulations are computation-light, memory-heavy problems in supercomputing. The considerable amount of memory is usually dominated by an inefficient way of storing more than sufficient phase space information of particles. We present a new parallel, information-optimized, particle-mesh-based $N$-body code CUBE, in which information-efficiency and memory-efficiency are increased by nearly an order of magnitude. This is accomplished by storing particles relative phase space coordinates instead of global values, and in the format of fixed point as light as 1 byte. The remaining information is given by complementary density and velocity fields (negligible in memory space) and proper ordering of particles (no extra memory). Our numerical experiments show that this information-optimized $N$-body algorithm provides accurate results within the error of the particle-mesh algorithm. This significant lowering of the memory-to-computation ratio breaks the bottleneck of scaling up and speeding up large cosmological $N$-body simulations on multi-core and heterogeneous computing systems.

rate research

CUBE -- Towards an Optimal Scaling of Cosmological N-body Simulations

71 - Shenggan Cheng , Hao-Ran Yu , Derek Inman 2020

N-body simulations are essential tools in physical cosmology to understand the large-scale structure (LSS) formation of the Universe. Large-scale simulations with high resolution are important for exploring the substructure of universe and for determining fundamental physical parameters like neutrino mass. However, traditional particle-mesh (PM) based algorithms use considerable amounts of memory, which limits the scalability of simulations. Therefore, we designed a two-level PM algorithm CUBE towards optimal performance in memory consumption reduction. By using the fixed-point compression technique, CUBE reduces the memory consumption per N-body particle toward 6 bytes, an order of magnitude lower than the traditional PM-based algorithms. We scaled CUBE to 512 nodes (20,480 cores) on an Intel Cascade Lake based supercomputer with $simeq$95% weak-scaling efficiency. This scaling test was performed in Cosmo-$pi$ -- a cosmological LSS simulation using $simeq$4.4 trillion particles, tracing the evolution of the universe over $simeq$13.7 billion years. To our best knowledge, Cosmo-$pi$ is the largest completed cosmological N-body simulation. We believe CUBE has a huge potential to scale on exascale supercomputers for larger simulations.

Computational Physics Cosmology and Nongalactic Astrophysics Distributed Parallel and Cluster Computing

FlowPM: Distributed TensorFlow Implementation of the FastPM Cosmological N-body Solver

75 - Chirag Modi , Francois Lanusse , Uros Seljak 2020

We present FlowPM, a Particle-Mesh (PM) cosmological N-body code implemented in Mesh-TensorFlow for GPU-accelerated, distributed, and differentiable simulations. We implement and validate the accuracy of a novel multi-grid scheme based on multiresolution pyramids to compute large scale forces efficiently on distributed platforms. We explore the scaling of the simulation on large-scale supercomputers and compare it with corresponding python based PM code, finding on an average 10x speed-up in terms of wallclock time. We also demonstrate how this novel tool can be used for efficiently solving large scale cosmological inference problems, in particular reconstruction of cosmological fields in a forward model Bayesian framework with hybrid PM and neural network forward model. We provide skeleton code for these examples and the entire code is publicly available at https://github.com/modichirag/flowpm.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

The optimal gravitational softening length for cosmological N-body simulations

95 - Tianchi Zhang , Shihong Liao , Ming Li 2018

Gravitational softening length is one of the key parameters to properly set up a cosmological $N$-body simulation. In this paper, we perform a large suit of high-resolution $N$-body simulations to revise the optimal softening scheme proposed by Power et al. (P03). Our finding is that P03 optimal scheme works well but is over conservative. Using smaller softening lengths than that of P03 can achieve higher spatial resolution and numerically convergent results on both circular velocity and density profiles. However using an over small softening length overpredicts matter density at the inner most region of dark matter haloes. We empirically explore a better optimal softening scheme based on P03 form and find that a small modification works well. This work will be useful for setting up cosmological simulations.

Cosmology and Nongalactic Astrophysics Astrophysics of Galaxies

Parallel HOP: A Scalable Halo Finder for Massive Cosmological Data Sets

349 - Stephen Skory 2010

Modern N-body cosmological simulations contain billions ($10^9$) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory, and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly-employed halo finders, such that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes MPI and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger datasets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit yt, an analysis toolkit for Adaptive Mesh Refinement (AMR) data that includes complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and datasets in excess of $2000^3$ particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem

511 - Tomoaki Ishiyama , Keigo Nitadori , Junichiro Makino 2012

As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics Computational Physics

comments

Fetching comments

Ebla Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

CUBE: An Information-optimized parallel Cosmological $N$-body Algorithm

Ask ChatGPT about the research

No Arabic abstract

Read More