Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero

422 0 0.0 ( 0 )

Download Cite

Added by Markus Wittmann

Publication date 2015

fields Informatics Engineering

and research's language is English

Authors Markus Wittmann - Thomas Zeiser - Georg Hager

Performance

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Simple floating point operations like addition or multiplication on normalized floating point values can be computed by current AMD and Intel processors in three to five cycles. This is different for denormalized numbers, which appear when an underflow occurs and the value can no longer be represented as a normalized floating-point value. Here the costs are about two magnitudes higher.

rate research

A Note on Disk Drag Dynamics

274 - Neil J. Gunther 2012

The electrical power consumed by typical magnetic hard disk drives (HDD) not only increases linearly with the number of spindles but, more significantly, it increases as very fast power-laws of speed (RPM) and diameter. Since the theoretical basis for this relationship is neither well-known nor readily accessible in the literature, we show how these exponents arise from aerodynamic disk drag and discuss their import for green storage capacity planning.

Performance Databases Classical Physics

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

66 - Jan Laukemann , Julian Hammer , Georg Hager 2019

Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and Execution-Cache-Memory (ECM) models. They enable a better understanding of the performance-relevant interactions between hardware architecture and loop code. The Open Source Architecture Code Analyzer (OSACA) is a static analysis tool for predicting the execution time of sequential loops. It previously supported only x86 (Intel and AMD) architectures and simple, optimistic full-throughput execution. We have heavily extended OSACA to support ARM instructions and critical path prediction including the detection of loop-carried dependencies, which turns it into a versatile cross-architecture modeling tool. We show runtime predictions for code on Intel Cascade Lake, AMD Zen, and Marvell ThunderX2 micro-architectures based on machine models from available documentation and semi-automatic benchmarking. The predictions are compared with actual measurements.

Performance

Direct N-body application on low-power and energy-efficient parallel architectures

180 - D. Goz , G. Ieronymakis , V. Papaefstathiou 2019

The aim of this work is to quantitatively evaluate the impact of computation on the energy consumption on ARM MPSoC platforms, exploiting CPUs, embedded GPUs and FPGAs. One of them possibly represents the future of High Performance Computing systems: a prototype of an Exascale supercomputer. Performance and energy measurements are made using a state-of-the-art direct $N$-body code from the astrophysical domain. We provide a comparison of the time-to-solution and energy delay product metrics, for different software configurations. We have shown that FPGA technologies can be used for application kernel acceleration and are emerging as a promising alternative to traditional technologies for HPC, which purely focus on peak-performance than on power-efficiency.

Performance Instrumentation and Methods for Astrophysics

A note on higher-order differential operations

427 - Branko J. Malesevic 2007

In this paper we consider successive iterations of the first-order differential operations in space ${bf R}^3.$

Differential Geometry Classical Analysis and ODEs

Division by zero in common meadows

538 - Jan A. Bergstra , Alban Ponse 2014

Common meadows are fields expanded with a total inverse function. Division by zero produces an additional value denoted with a that propagates through all operations of the meadow signature (this additional value can be interpreted as an error element). We provide a basis theorem for so-called common cancellation meadows of characteristic zero, that is, common meadows of characteristic zero that admit a certain cancellation law.

Rings and Algebras

comments

Fetching comments

University of Babylon

Additional details More universities

Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero

Ask ChatGPT about the research

No Arabic abstract

Read More