ﻻ يوجد ملخص باللغة العربية
In this work, we collect data from runs of Krylov subspace methods and pipelined Krylov algorithms in an effort to understand and model the impact of machine noise and other sources of variability on performance. We find large variability of Krylov iterations between compute nodes for standard methods that is reduced in pipelined algorithms, directly supporting conjecture, as well as large variation between statistical distributions of runtimes across iterations. Based on these results, we improve upon a previously introduced nondeterministic performance model by allowing iterations to fluctuate over time. We present our data from runs of various Krylov algorithms across multiple platforms as well as our updated non-stationary model that provides good agreement with observations. We also suggest how it can be used as a predictive tool.
Exascale computing will feature novel and potentially disruptive hardware architectures. Exploiting these to their full potential is non-trivial. Numerical modelling frameworks involving finite difference methods are currently limited by the static n
Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to determine whet
The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms employ the Arnoldi algorithm for nonsymmetric matrices. The underlying
We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coor
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are