No Arabic abstract
GAMER is a GPU-accelerated Adaptive-MEsh-Refinement code for astrophysical simulations. In this work, two further extensions of the code are reported. First, we have implemented the MUSCL-Hancock method with the Roes Riemann solver for the hydrodynamic evolution, by which the accuracy, overall performance and the GPU versus CPU speed-up factor are improved. Second, we have implemented the out-of-core computation, which utilizes the large storage space of multiple hard disks as the additional run-time virtual memory and permits an extremely large problem to be solved in a relatively small-size GPU cluster. The communication overhead associated with the data transfer between the parallel hard disks and the main memory is carefully reduced by overlapping it with the CPU/GPU computations.
Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. However, the performance of the GPU-based out-of-core stencil computation is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, but the study on the use of on-the-fly compression techniques is far from sufficient. In this study, we propose a method that accelerates the GPU-based out-of-core stencil computation with on-the-fly compression. We introduce a novel data compression approach that solves the data dependency between two contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps CPU/GPU data transfer with GPU computation. Experimental results show that the proposed method achieved a speedup of 1.2x compared the method without compression. Moreover, although the precision loss involved by compression increased with the number of time steps, the precision loss was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.
We present GAMER-2, a GPU-accelerated adaptive mesh refinement (AMR) code for astrophysics. It provides a rich set of features, including adaptive time-stepping, several hydrodynamic schemes, magnetohydrodynamics, self-gravity, particles, star formation, chemistry and radiative processes with GRACKLE, data analysis with yt, and memory pool for efficient object allocation. GAMER-2 is fully bitwise reproducible. For the performance optimization, it adopts hybrid OpenMP/MPI/GPU parallelization and utilizes overlapping CPU computation, GPU computation, and CPU-GPU communication. Load balancing is achieved using a Hilbert space-filling curve on a level-by-level basis without the need to duplicate the entire AMR hierarchy on each MPI process. To provide convincing demonstrations of the accuracy and performance of GAMER-2, we directly compare with Enzo on isolated disk galaxy simulations and with FLASH on galaxy cluster merger simulations. We show that the physical results obtained by different codes are in very good agreement, and GAMER-2 outperforms Enzo and FLASH by nearly one and two orders of magnitude, respectively, on the Blue Waters supercomputers using $1-256$ nodes. More importantly, GAMER-2 exhibits similar or even better parallel scalability compared to the other two codes. We also demonstrate good weak and strong scaling using up to 4096 GPUs and 65,536 CPU cores, and achieve a uniform resolution as high as $10{,}240^3$ cells. Furthermore, GAMER-2 can be adopted as an AMR+GPUs framework and has been extensively used for the wave dark matter ($psi$DM) simulations. GAMER-2 is open source (available at https://github.com/gamer-project/gamer) and new contributions are welcome.
How do massive stars explode? Progress toward the answer is driven by increases in compute power. Petascale supercomputers are enabling detailed three-dimensional simulations of core-collapse supernovae. These are elucidating the role of fluid instabilities, turbulence, and magnetic field amplification in supernova engines.
We describe a space-borne, multi-band, multi-beam polarimeter aiming at a precise and accurate measurement of the polarization of the Cosmic Microwave Background. The instrument is optimized to be compatible with the strict budget requirements of a medium-size space mission within the Cosmic Vision Programme of the European Space Agency. The instrument has no moving parts, and uses arrays of diffraction-limited Kinetic Inductance Detectors to cover the frequency range from 60 GHz to 600 GHz in 19 wide bands, in the focal plane of a 1.2 m aperture telescope cooled at 40 K, allowing for an accurate extraction of the CMB signal from polarized foreground emission. The projected CMB polarization survey sensitivity of this instrument, after foregrounds removal, is 1.7 {mu}K$cdot$arcmin. The design is robust enough to allow, if needed, a downscoped version of the instrument covering the 100 GHz to 600 GHz range with a 0.8 m aperture telescope cooled at 85 K, with a projected CMB polarization survey sensitivity of 3.2 {mu}K$cdot$arcmin.