ﻻ يوجد ملخص باللغة العربية
This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements. We propose a suite of new Benchmark Streaming tests to focus on the distinct streaming operations which must be performed. We implemented these new tests using the OCCA abstraction framework to demonstrate portability of these streaming operations on different GPU architectures, and propose a simple performance model for such kernels which can accurately capture data movement rates as well as kernel launch costs.
Creating scalable, high performance PDE-based simulations requires a suitable combination of discretizations, differential operators, preconditioners and solvers. The required combination changes with the application and with the available hardware,
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating p
We introduce PyParSVDfootnote{https://github.com/Romit-Maulik/PyParSVD}, a Python library that implements a streaming, distributed and randomized algorithm for the singular value decomposition. To demonstrate its effectiveness, we extract coherent st
We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific ke