An algorithm for the optimization of finite element integration loops

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Fabio Luporini

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fabio Luporini - David A. Ham - Paul H. J. Kelly

البرمجيات الرياضية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite element code generation in almost all cases. This validates the effectiveness of the algorithm presented here, and illustrates its limitations.

قيم البحث

81 - Matthew G. Knepley , Karl Rupp , Andy R. Terrel 2016

We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order discretization of the Laplacian operator with variable coefficients in two dimensions, and over 400 GF/s in three dimensions. From our performance model we find that this corresponds to 90% of our measured achievable bandwidth peak of 310 GF/s. Further experimental results also match the predicted performance when used with double precision (120 GF/s in two dimensions, 150 GF/s in three dimensions). Results obtained for the linear elasticity equations (220 GF/s and 70 GF/s in two dimensions, 180 GF/s and 60 GF/s in three dimensions) also demonstrate the applicability of our method to vector-valued partial differential equations.

البرمجيات الرياضية

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

214 - Fabio Luporini , Ana Lucia Varbanescu , Florian Rathgeber 2014

The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific ke rnel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient kernels is fundamental. Their op- timization is, however, a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions make it hard to determine a single or unique sequence of successful transformations. Therefore, we present the design and systematic evaluation of COF- FEE, a domain-specific compiler for local assembly kernels. COFFEE manipulates abstract syntax trees generated from a high-level domain-specific language for PDEs by introducing domain-aware composable optimizations aimed at improving instruction-level parallelism, especially SIMD vectorization, and register locality. It then generates C code including vector intrinsics. Experiments using a range of finite-element forms of increasing complexity show that significant performance improvement is achieved.

البرمجيات الرياضية الهندسة الحاسوبية، المالية،العلوم الأداء

Code generation for productive portable scalable finite element simulation in Firedrake

69 - Jack D. Betteridge , Patrick E. Farrell , David A. Ham 2021

Creating scalable, high performance PDE-based simulations requires a suitable combination of discretizations, differential operators, preconditioners and solvers. The required combination changes with the application and with the available hardware, yet software development time is a severely limited resource for most scientists and engineers. Here we demonstrate that generating simulation code from a high-level Python interface provides an effective mechanism for creating high performance simulations from very few lines of user code. We demonstrate that moving from one supercomputer to another can require significant algorithmic changes to achieve scalable performance, but that the code generation approach enables these algorithmic changes to be achieved with minimal development effort.

البرمجيات الرياضية

An open-source ABAQUS implementation of the scaled boundary finite element method to study interfacial problems using polyhedral meshes

94 - Shukai Ya , Sascha Eisentrager , Chongmin Song 2021

The scaled boundary finite element method (SBFEM) is capable of generating polyhedral elements with an arbitrary number of surfaces. This salient feature significantly alleviates the meshing burden being a bottleneck in the analysis pipeline in the s tandard finite element method (FEM). In this paper, we implement polyhedral elements based on the SBFEM into the commercial finite element software ABAQUS. To this end, user elements are provided through the user subroutine UEL. Detailed explanations regarding the data structures and implementational aspects of the procedures are given. The focus of the current implementation is on interfacial problems and therefore, element-based surfaces are created on polyhedral user elements to establish interactions. This is achieved by an overlay of standard finite elements with negligible stiffness, provided in the ABAQUS element library, with polyhedral user elements. By means of several numerical examples, the advantages of polyhedral elements regarding the treatment of non-matching interfaces and automatic mesh generation are clearly demonstrated. Thus, the performance of ABAQUS for problems involving interfaces is augmented based on the availability of polyhedral meshes. Due to the implementation of polyhedral user elements, ABAQUS can directly handle complex geometries given in the form of digital images or stereolithography (STL) files. In order to facilitate the use of the proposed approach, the code of the UEL is published open-source and can be downloaded from https://github.com/ShukaiYa/SBFEM-UEL.

البرمجيات الرياضية

Portable high-order finite element kernels I: Streaming Operations

83 - Noel Chalmers , Tim Warburton 2020

This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements. We propose a suite of new Benchmark Streaming tests to focus on the distinct streaming operations which must be performed. We implemented these new tests using the OCCA abstraction framework to demonstrate portability of these streaming operations on different GPU architectures, and propose a simple performance model for such kernels which can accurately capture data movement rates as well as kernel launch costs.

البرمجيات الرياضية النظم الموزعة والتوازية والحوسبة العنقودية التحليل العددي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة تشرين

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An algorithm for the optimization of finite element integration loops

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً