أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Markus Wittmann

Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero

65 - Markus Wittmann , Thomas Zeiser , Georg Hager 2015

Simple floating point operations like addition or multiplication on normalized floating point values can be computed by current AMD and Intel processors in three to five cycles. This is different for denormalized numbers, which appear when an underfl ow occurs and the value can no longer be represented as a normalized floating-point value. Here the costs are about two magnitudes higher.

الأداء

Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices

220 - M. Wittmann , T. Zeiser , G. Hager 2014

Computational fluid dynamics (CFD) requires a vast amount of compute cycles on contemporary large-scale parallel computers. Hence, performance optimization is a pivotal activity in this field of computational science. Not only does it reduce the time to solution, but it also allows to minimize the energy consumption. In this work we study performance optimizations for an MPI-parallel lattice Boltzmann-based flow solver that uses a sparse lattice representation with indirect addressing. First we describe how this indirect addressing can be minimized in order to increase the single-core and chip-level performance. Second, the communication overhead is reduced via appropriate partitioning, but maintaining the single core performance improvements. Both optimizations allow to run the solver at an operating point with minimal energy consumption.

النظم الموزعة والتوازية والحوسبة العنقودية

Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations

74 - Markus Wittmann , Thomas Zeiser , Georg Hager 2011

We present a simple, parallel and distributed algorithm for setting up and partitioning a sparse representation of a regular discretized simulation domain. This method is scalable for a large number of processes even for complex geometries and ensure s load balance between the domains, reasonable communication interfaces, and good data locality within the domain. Applying this scheme to a list-based lattice Boltzmann flow solver can achieve similar or even higher flow solver performance than widely used standard graph partition based tools such as METIS and PT-SCOTCH.

النظم الموزعة والتوازية والحوسبة العنقودية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد