ترغب بنشر مسار تعليمي؟ اضغط هنا

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

240   0   0.0 ( 0 )
 نشر من قبل Pratik Nayak
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNLs Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered mature technology, but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods.



قيم البحث

اقرأ أيضاً

Nektar++ is an open-source framework that provides a flexible, high-performance and scalable platform for the development of solvers for partial differential equations using the high-order spectral/$hp$ element method. In particular, Nektar++ aims to overcome the complex implementation challenges that are often associated with high-order methods, thereby allowing them to be more readily used in a wide range of application areas. In this paper, we present the algorithmic, implementation and application developments associated with our Nektar++ version 5.0 release. We describe some of the key software and performance developments, including our strategies on parallel I/O, on in situ processing, the use of collective operations for exploiting current and emerging hardware, and interfaces to enable multi-solver coupling. Furthermore, we provide details on a newly developed Python interface that enables a more rapid introduction for new users unfamiliar with spectral/$hp$ element methods, C++ and/or Nektar++. This release also incorporates a number of numerical method developments - in particular: the method of moving frames, which provides an additional approach for the simulation of equations on embedded curvilinear manifolds and domains; a means of handling spatially variable polynomial order; and a novel technique for quasi-3D simulations to permit spatially-varying perturbations to the geometry in the homogeneous direction. Finally, we demonstrate the new application-level features provided in this release, namely: a facility for generating high-order curvilinear meshes called NekMesh; a novel new AcousticSolver for aeroacoustic problems; our development of a thick strip model for the modelling of fluid-structure interaction problems in the context of vortex-induced vibrations. We conclude by commenting some directions for future code development and expansion.
Additive Runge-Kutta methods designed for preserving highly accurate solutions in mixed-precision computation were proposed and analyzed in [8]. These specially designed methods use reduced precision or the implicit computations and full precision fo r the explicit computations. We develop a FORTRAN code to solve a nonlinear system of ordinary differential equations using the mixed precision additive Runge-Kutta (MP-ARK) methods on IBM POWER9 and Intel x86_64 chips. The convergence, accuracy, runtime, and energy consumption of these methods is explored. We show that these MP-ARK methods efficiently produce accurate solutions with significant reductions in runtime (and by extension energy consumption).
224 - Zachary J. Grant 2020
In this work we consider a mixed precision approach to accelerate the implemetation of multi-stage methods. We show that Runge-Kutta methods can be designed so that certain costly intermediate computations can be performed as a lower-precision comput ation without adversely impacting the accuracy of the overall solution. In particular, a properly designed Runge-Kutta method will damp out the errors committed in the initial stages. This is of particular interest when we consider implicit Runge-Kutta methods. In such cases, the implicit computation of the stage values can be considerably faster if the solution can be of lower precision (or, equivalently, have a lower tolerance). We provide a general theoretical additive framework for designing mixed precision Runge-Kutta methods, and use this framework to derive order conditions for such methods. Next, we show how using this approach allows us to leverage low precision computation of the implicit solver while retaining high precision in the overall method. We present the behavior of some mixed-precision implicit Runge-Kutta methods through numerical studies, and demonstrate how the numerical results match with the theoretical framework. This novel mixed-precision implicit Runge-Kutta framework opens the door to the design of many such methods.
This article presents the Moore library for interval arithmetic in C++20. It gives examples of how the library can be used, and explains the basic principles underlying its design.
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse lin ear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا