ﻻ يوجد ملخص باللغة العربية
Computational fluid dynamics (CFD) requires a vast amount of compute cycles on contemporary large-scale parallel computers. Hence, performance optimization is a pivotal activity in this field of computational science. Not only does it reduce the time to solution, but it also allows to minimize the energy consumption. In this work we study performance optimizations for an MPI-parallel lattice Boltzmann-based flow solver that uses a sparse lattice representation with indirect addressing. First we describe how this indirect addressing can be minimized in order to increase the single-core and chip-level performance. Second, the communication overhead is reduced via appropriate partitioning, but maintaining the single core performance improvements. Both optimizations allow to run the solver at an operating point with minimal energy consumption.
GPUs offer several times the floating point performance and memory bandwidth of current standard two socket CPU servers, e.g. NVIDIA C2070 vs. Intel Xeon Westmere X5650. The lattice Boltzmann method has been established as a flow solver in recent yea
An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as accelerato
We revisit the classical stability versus accuracy dilemma for the lattice Boltzmann methods (LBM). Our goal is a stable method of second-order accuracy for fluid dynamics based on the lattice Bhatnager--Gross--Krook method (LBGK). The LBGK scheme
In this paper, we first present a unified framework for the modelling of generalized lattice Boltzmann method (GLBM). We then conduct a comparison of the four popular analysis methods (Chapman-Enskog analysis, Maxwell iteration, direct Taylor expansi
High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach in which hosts offload almost all compute-intensive sections of the code onto acceler