ترغب بنشر مسار تعليمي؟ اضغط هنا

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

138   0   0.0 ( 0 )
 نشر من قبل Fabio Luporini
 تاريخ النشر 2014
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient kernels is fundamental. Their op- timization is, however, a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions make it hard to determine a single or unique sequence of successful transformations. Therefore, we present the design and systematic evaluation of COF- FEE, a domain-specific compiler for local assembly kernels. COFFEE manipulates abstract syntax trees generated from a high-level domain-specific language for PDEs by introducing domain-aware composable optimizations aimed at improving instruction-level parallelism, especially SIMD vectorization, and register locality. It then generates C code including vector intrinsics. Experiments using a range of finite-element forms of increasing complexity show that significant performance improvement is achieved.

قيم البحث

اقرأ أيضاً

We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite element code generation in almost all cases. This validates the effectiveness of the algorithm presented here, and illustrates its limitations.
In this paper, we propose a local-global multiscale method for highly heterogeneous stochastic groundwater flow problems under the framework of reduced basis method and the generalized multiscale finite element method (GMsFEM). Due to incomplete char acterization of the medium properties of the groundwater flow problems, random variables are used to parameterize the uncertainty. As a result, solving the problem repeatedly is required to obtain statistical quantities. Besides, the medium properties are usually highly heterogeneous, which will result in a large linear system that needs to be solved. Therefore, it is intrinsically inevitable to seek a computational-efficient model reduction method to overcome the difficulty. We will explore the combination of the reduced basis method and the GMsFEM. In particular, we will use residual-driven basis functions, which are key ingredients in GMsFEM. This local-global multiscale method is more efficient than applying the GMsFEM or reduced basis method individually. We first construct parameter-independent multiscale basis functions that include both local and global information of the permeability fields, and then use these basis functions to construct several global snapshots and global basis functions for fast online computation with different parameter inputs. We provide rigorous analysis of the proposed method and extensive numerical examples to demonstrate the accuracy and efficiency of the local-global multiscale method.
Creating scalable, high performance PDE-based simulations requires a suitable combination of discretizations, differential operators, preconditioners and solvers. The required combination changes with the application and with the available hardware, yet software development time is a severely limited resource for most scientists and engineers. Here we demonstrate that generating simulation code from a high-level Python interface provides an effective mechanism for creating high performance simulations from very few lines of user code. We demonstrate that moving from one supercomputer to another can require significant algorithmic changes to achieve scalable performance, but that the code generation approach enables these algorithmic changes to be achieved with minimal development effort.
We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order discretization of the Laplacian operator with variable coefficients in two dimensions, and over 400 GF/s in three dimensions. From our performance model we find that this corresponds to 90% of our measured achievable bandwidth peak of 310 GF/s. Further experimental results also match the predicted performance when used with double precision (120 GF/s in two dimensions, 150 GF/s in three dimensions). Results obtained for the linear elasticity equations (220 GF/s and 70 GF/s in two dimensions, 180 GF/s and 60 GF/s in three dimensions) also demonstrate the applicability of our method to vector-valued partial differential equations.
Quilc is an open-source, optimizing compiler for gate-based quantum programs written in Quil or QASM, two popular quantum programming languages. The compiler was designed with attention toward NISQ-era quantum computers, specifically recognizing that each quantum gate has a non-negligible and often irrecoverable cost toward a programs successful execution. Quilcs primary goal is to make authoring quantum software a simpler exercise by making architectural details less burdensome to the author. Using Quilc allows one to write programs faster while usually not compromising---and indeed sometimes improving---their execution fidelity on a given hardware architecture. In this paper, we describe many of the principles behind Quilcs design, and demonstrate the compiler with various examples.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا