ﻻ يوجد ملخص باللغة العربية
We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite element code generation in almost all cases. This validates the effectiveness of the algorithm presented here, and illustrates its limitations.
We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific ke
Creating scalable, high performance PDE-based simulations requires a suitable combination of discretizations, differential operators, preconditioners and solvers. The required combination changes with the application and with the available hardware,
The scaled boundary finite element method (SBFEM) is capable of generating polyhedral elements with an arbitrary number of surfaces. This salient feature significantly alleviates the meshing burden being a bottleneck in the analysis pipeline in the s
This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the