ترغب بنشر مسار تعليمي؟ اضغط هنا

Simple floating point operations like addition or multiplication on normalized floating point values can be computed by current AMD and Intel processors in three to five cycles. This is different for denormalized numbers, which appear when an underfl ow occurs and the value can no longer be represented as a normalized floating-point value. Here the costs are about two magnitudes higher.
220 - M. Wittmann , T. Zeiser , G. Hager 2014
Computational fluid dynamics (CFD) requires a vast amount of compute cycles on contemporary large-scale parallel computers. Hence, performance optimization is a pivotal activity in this field of computational science. Not only does it reduce the time to solution, but it also allows to minimize the energy consumption. In this work we study performance optimizations for an MPI-parallel lattice Boltzmann-based flow solver that uses a sparse lattice representation with indirect addressing. First we describe how this indirect addressing can be minimized in order to increase the single-core and chip-level performance. Second, the communication overhead is reduced via appropriate partitioning, but maintaining the single core performance improvements. Both optimizations allow to run the solver at an operating point with minimal energy consumption.
We present a simple, parallel and distributed algorithm for setting up and partitioning a sparse representation of a regular discretized simulation domain. This method is scalable for a large number of processes even for complex geometries and ensure s load balance between the domains, reasonable communication interfaces, and good data locality within the domain. Applying this scheme to a list-based lattice Boltzmann flow solver can achieve similar or even higher flow solver performance than widely used standard graph partition based tools such as METIS and PT-SCOTCH.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا