A tutorial-driven introduction to the parallel finite element library FEMPAR v1.0.0

127 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alberto F. Mart\\'in

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Santiago Badia - Alberto F. Martin

البرمجيات الرياضية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This work is a user guide to the FEMPAR scientific software library. FEMPAR is an open-source object-oriented framework for the simulation of partial differential equations (PDEs) using finite element methods on distributed-memory platforms. It provides a rich set of tools for numerical discretization and built-in scalable solvers for the resulting linear systems of equations. An application expert that wants to simulate a PDE-governed problem has to extend the framework with a description of the weak form of the PDE at hand (and additional perturbation terms for non-conforming approximations). We show how to use the library by going through three different tutorials. The first tutorial simulates a linear PDE (Poisson equation) in a serial environment for a structured mesh using both continuous and discontinuous Galerkin finite element methods. The second tutorial extends it with adaptive mesh refinement on octree meshes. The third tutorial is a distributed-memory version of the previous one that combines a scalable octree handler and a scalable domain decomposition solver. The exposition is restricted to linear PDEs and simple geometries to keep it concise. The interested user can dive into more tutorials available in the FEMPAR public repository to learn about further capabilities of the library, e.g., nonlinear PDEs and nonlinear solvers, time integration, multi-field PDEs, block preconditioning, or unstructured mesh handling.

قيم البحث

74 - Santiago Badia , Alberto F. Martin , Eric Neiva 2019

In this work we formally derive and prove the correctness of the algorithms and data structures in a parallel, distributed-memory, generic finite element framework that supports h-adaptivity on computational domains represented as forest-of-trees. Th e framework is grounded on a rich representation of the adaptive mesh suitable for generic finite elements that is built on top of a low-level, light-weight forest-of-trees data structure handled by a specialized, highly parallel adaptive meshing engine, for which we have identified the requirements it must fulfill to be coupled into our framework. Atop this two-layered mesh representation, we build the rest of data structures required for the numerical integration and assembly of the discrete system of linear equations. We consider algorithms that are suitable for both subassembled and fully-assembled distributed data layouts of linear system matrices. The proposed framework has been implemented within the FEMPAR scientific software library, using p4est as a practical forest-of-octrees demonstrator. A strong scaling study of this implementation when applied to Poisson and Maxwell problems reveals remarkable scalability up to 32.2K CPU cores and 482.2M degrees of freedom. Besides, a comparative performance study of FEMPAR and the state-of-the-art deal.ii finite element software shows at least comparative performance, and at most factor 2-3 improvements in the h-adaptive approximation of a Poisson problem with first- and second-order Lagrangian finite elements, respectively.

البرمجيات الرياضية التحليل العددي التحليل العددي

Finite Element Integration with Quadrature on the GPU

81 - Matthew G. Knepley , Karl Rupp , Andy R. Terrel 2016

We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order discretization of the Laplacian operator with variable coefficients in two dimensions, and over 400 GF/s in three dimensions. From our performance model we find that this corresponds to 90% of our measured achievable bandwidth peak of 310 GF/s. Further experimental results also match the predicted performance when used with double precision (120 GF/s in two dimensions, 150 GF/s in three dimensions). Results obtained for the linear elasticity equations (220 GF/s and 70 GF/s in two dimensions, 180 GF/s and 60 GF/s in three dimensions) also demonstrate the applicability of our method to vector-valued partial differential equations.

البرمجيات الرياضية

An algorithm for the optimization of finite element integration loops

61 - Fabio Luporini , David A. Ham , Paul H. J. Kelly 2016

We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite element code generation in almost all cases. This validates the effectiveness of the algorithm presented here, and illustrates its limitations.

البرمجيات الرياضية

Intel Cilk Plus for Complex Parallel Algorithms: Enormous Fast Fourier Transform (EFFT) Library

209 - Ryo Asai , Andrey Vladimirov 2014

In this paper we demonstrate the methodology for parallelizing the computation of large one-dimensional discrete fast Fourier transforms (DFFTs) on multi-core Intel Xeon processors. DFFTs based on the recursive Cooley-Tukey method have to control cac he utilization, memory bandwidth and vector hardware usage, and at the same time scale across multiple threads or compute nodes. Our method builds on single-threaded Intel Math Kernel Library (MKL) implementation of DFFT, and uses the Intel Cilk Plus framework for thread parallelism. We demonstrate the ability of Intel Cilk Plus to handle parallel recursion with nested loop-centric parallelism without tuning the code to the number of cores or cache metrics. The result of our work is a library called EFFT that performs 1D DFTs of size 2^N for N>=21 faster than the corresponding Intel MKL parallel DFT implementation by up to 1.5x, and faster than FFTW by up to 2.5x. The code of EFFT is available for free download under the GPLv3 license. This work provides a new efficient DFFT implementation, and at the same time demonstrates an educational example of how computer science problems with complex parallel patterns can be optimized for high performance using the Intel Cilk Plus framework.

البرمجيات الرياضية النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

187 - Fabio Luporini , Ana Lucia Varbanescu , Florian Rathgeber 2014

The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific ke rnel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient kernels is fundamental. Their op- timization is, however, a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions make it hard to determine a single or unique sequence of successful transformations. Therefore, we present the design and systematic evaluation of COF- FEE, a domain-specific compiler for local assembly kernels. COFFEE manipulates abstract syntax trees generated from a high-level domain-specific language for PDEs by introducing domain-aware composable optimizations aimed at improving instruction-level parallelism, especially SIMD vectorization, and register locality. It then generates C code including vector intrinsics. Experiments using a range of finite-element forms of increasing complexity show that significant performance improvement is achieved.

البرمجيات الرياضية الهندسة الحاسوبية، المالية،العلوم الأداء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A tutorial-driven introduction to the parallel finite element library FEMPAR v1.0.0

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً