ﻻ يوجد ملخص باللغة العربية
We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates $343$ million grid points per second on a Tesla K40t GPU, achieving a $3.6 times$ speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of $168$ million updates per second.
In this paper, our goal is to efficiently solve the Vlasov equation on GPUs. A semi-Lagrangian discontinuous Galerkin scheme is used for the discretization. Such kinetic computations are extremely expensive due to the high-dimensional phase space. Th
A high fidelity flow simulation for complex geometries for high Reynolds number ($Re$) flow is still very challenging, which requires more powerful computational capability of HPC system. However, the development of HPC with traditional CPU architect
Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model followin
Recently, a 4th-order asymptotic preserving multiderivative implicit-explicit (IMEX) scheme was developed (Schutz and Seal 2020, arXiv:2001.08268). This scheme is based on a 4th-order Hermite interpolation in time, and uses an approach based on opera
Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order sten