No Arabic abstract
We present the implementation and performance of a class of directionally unsplit Riemann-solver-based hydrodynamic schemes on Graphic Processing Units (GPU). These schemes, including the MUSCL-Hancock method, a variant of the MUSCL-Hancock method, and the corner-transport-upwind method, are embedded into the adaptive-mesh-refinement (AMR) code GAMER. Furthermore, a hybrid MPI/OpenMP model is investigated, which enables the full exploitation of the computing power in a heterogeneous CPU/GPU cluster and significantly improves the overall performance. Performance benchmarks are conducted on the Dirac GPU cluster at NERSC/LBNL using up to 32 Tesla C2050 GPUs. A single GPU achieves speed-ups of 101(25) and 84(22) for uniform-mesh and AMR simulations, respectively, as compared with the performance using one(four) CPU core(s), and the excellent performance persists in multi-GPU tests. In addition, we make a direct comparison between GAMER and the widely-adopted CPU code Athena (Stone et al. 2008) in adiabatic hydrodynamic tests and demonstrate that, with the same accuracy, GAMER is able to achieve two orders of magnitude performance speed-up.
Radio astronomical imaging arrays comprising large numbers of antennas, O(10^2-10^3) have posed a signal processing challenge because of the required O(N^2) cross correlation of signals from each antenna and requisite signal routing. This motivated the implementation of a Packetized Correlator architecture that applies Field Programmable Gate Arrays (FPGAs) to the O(N) F-stage transforming time domain to frequency domain data, and Graphics Processing Units (GPUs) to the O(N^2) X-stage performing an outer product among spectra for each antenna. The design is readily scalable to at least O(10^3) antennas. Fringes, visibility amplitudes and sky image results obtained during field testing are presented.
To study the resolution required for simulating gravitational fragmentation with newly developed Lagrangian hydrodynamic schemes, Meshless Finite Volume method (MFV) and Meshless Finite Mass method (MFM), we have performed a number of simulations of the Jeans test and compared the results with both the expected analytic solution and results from the more standard Lagrangian approach: Smoothed Particle Hydrodynamics (SPH). We find that the different schemes converge to the analytic solution when the diameter of a fluid element is smaller than a quarter of the Jeans wavelength, $lambda_mathrm{J}$. Among the three schemes, SPH/MFV shows the fastest/slowest convergence to the analytic solution. Unlike the well-known behaviour of Eulerian schemes, none of the Lagrangian schemes investigated displays artificial fragmentation when the perturbation wavelength, $lambda$, is shorter than $lambda_mathrm{J}$, even at low numerical resolution. For larger wavelengths ($lambda > lambda_mathrm{J}$) the growth of the perturbation is delayed when it is not well resolved. Furthermore, with poor resolution, the fragmentation seen with the MFV scheme proceeds very differently compared to the converged solution. All these results suggest that, when unresolved, the ratio of the magnitude of hydrodynamic force to that of self-gravity at the sub-resolution scale is the largest/smallest in MFV/SPH, the reasons for which we discussed in detail. These tests are repeated to investigate the effect of kernels of higher-order than the fiducial cubic spline. Our results indicate that the standard deviation of the kernel is a more appropriate definition of the size of a fluid element than its compact support radius.
As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models of the Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We demonstrate the performance impact of our designs with microbenchmarks adapted from the OSU benchmark suite, obtaining improvements in latency of up to 10.2x, 11.7x, and 17.4x in Charm++, AMPI, and Charm4py, respectively. We also observe increases in bandwidth of up to 9.6x in Charm++, 10x in AMPI, and 10.5x in Charm4py. We show the potential impact of our designs on real-world applications by evaluating a proxy application for the Jacobi iterative method, improving the communication performance by up to 12.4x in Charm++, 12.8x in AMPI, and 19.7x in Charm4py.
With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall performance of a cluster built with nodes having a dual Quad-Core Processor on each node. Some benchmark results are presented and some observations are mentioned when handling such processors on a benchmark test. A Quad-Core-based clusters complexity arises from the fact that both local communication and network communications between the running processes need to be addressed. The potentials of an MPI-OpenMP approach are pinpointed because of its reduced communication overhead. At the end, we come to a conclusion that an MPI-OpenMP solution should be considered in such clusters since optimizing network communications between nodes is as important as optimizing local communications between processors in a multi-core cluster.
Radiation controls the dynamics and energetics of many astrophysical environments. To capture the coupling between the radiation and matter, however, is often a physically complex and computationally expensive endeavour. We develop a numerical tool to perform radiation-hydrodynamics simulations in various configurations at an affordable cost. We build upon the finite volume code MPI-AMRVAC to solve the equations of hydrodynamics on multi-dimensional adaptive meshes and introduce a new module to handle the coupling with radiation. A non-equilibrium, flux-limiting diffusion approximation is used to close the radiation momentum and energy equations. The time-dependent radiation energy equation is then solved within a flexible framework, accounting fully for radiation forces and work terms and further allowing the user to adopt a variety of descriptions for the radiation-matter interaction terms (the opacities). We validate the radiation module on a set of standard testcases for which different terms of the radiative energy equation predominate. As a preliminary application to a scientific case, we calculate spherically symmetric models of the radiation-driven and optically thick supersonic outflows from massive Wolf-Rayet stars. This also demonstrates our codes flexibility, as the illustrated simulation combines opacities typically used in static stellar structure models with a parametrised form for the enhanced line-opacity expected in supersonic flows. This new module provides a convenient and versatile tool to perform multi-dimensional and high resolution radiative-hydrodynamics simulations in optically thick environments with the MPI-AMRVAC code. The code is ready to be used for a variety of astrophysical applications, where a first target for us will be multi-dimensional simulations of stellar outflows from Wolf-Rayet stars.