Automatic Generation of Efficient Linear Algebra Programs

323 0 0.0 ( 0 )

Download Cite

Added by Henrik Barthels M.Sc.

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Henrik Barthels - Christos Psarras - Paolo Bientinesi

Mathematical Software

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently captured by high-level languages and libraries such as Matlab and Eigen, while the latter expresses the kernels included in the BLAS and LAPACK libraries. Unfortunately, the translation from a high-level computation to an efficient sequence of kernels is a task, far from trivial, that requires extensive knowledge of both linear algebra and high-performance computing. Internally, almost all high-level languages and libraries use efficient kernels; however, the translation algorithms are too simplistic and thus lead to a suboptimal use of said kernels, with significant performance losses. In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels. In 25 application problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10x.

rate research

Program Generation for Linear Algebra Using Multiple Layers of DSLs

62 - Daniele G. Spampinato 2019

Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK, and while such libraries provide portable performance for a wide range of computing architectures, they still present limitations in terms of flexibility. We advocate a domain-specific program generator capable of producing library routines tailored to the specific needs of the application in terms of sizes, interface, and target architecture.

Mathematical Software

lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods

70 - Martin Bauer , Harald Kostler , Ulrich Rude 2020

Lattice Boltzmann methods are a popular mesoscopic alternative to macroscopic computational fluid dynamics solvers. Many variants have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate multi-phase, multi-component, turbulent, or non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of different methods and provides a generic development environment for new schemes as well. A high-level domain-specific language allows the user to formulate, extend and test various lattice Boltzmann schemes. The method specification is represented in a symbolic intermediate representation. Transformations that operate on this intermediate representation optimize and parallelize the method, yielding highly efficient lattice Boltzmann compute kernels not only for single- and two-relaxation-time schemes but also for multi-relaxation-time, cumulant, and entropically stabilized methods. An integration into the HPC framework waLBerla makes massively parallel, distributed simulations possible, which is demonstrated through scaling experiments on the SuperMUC-NG supercomputing system

Mathematical Software Computational Engineering Distributed Parallel and Cluster Computing

A Review of automatic differentiation and its efficient implementation

420 - Charles C. Margossian 2018

Derivatives play a critical role in computational statistics, examples being Bayesian inference using Hamiltonian Monte Carlo sampling and the training of neural networks. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. The implementation of automatic differentiation however requires some care to insure efficiency. Modern differentiation packages deploy a broad range of computational techniques to improve applicability, run time, and memory management. Among these techniques are operation overloading, region based memory, and expression templates. There also exist several mathematical techniques which can yield high performance gains when applied to complex algorithms. For example, semi-analytical derivatives can reduce by orders of magnitude the runtime required to numerically solve and differentiate an algebraic equation. Open problems include the extension of current packages to provide more specialized routines, and efficient methods to perform higher-order differentiation.

Mathematical Software Computation

Automatic Generation of Interpolants for Lattice Samplings: Part II -- Implementation and Code Generation

122 - Joshua Horacsek , Usman Alim 2021

In the prequel to this paper, we presented a systematic framework for processing spline spaces. In this paper, we take the results of that framework and provide a code generation pipeline that automatically generates efficient implementations of spline spaces. We decompose the final algorithm from Part I and translate the resulting components into LLVM-IR (a low level language that can be compiled to various targets/architectures). Our design provides a handful of parameters for a practitioner to tune - this is one of the avenues that provides us with the flexibility to target many different computational architectures and tune performance on those architectures. We also provide an evaluation of the effect of the different parameters on performance.

Mathematical Software

BLASFEO: basic linear algebra subroutines for embedded optimization

118 - Gianluca Frison , Dimitris Kouzoupis , Tommaso Sartor 2017

BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the computational performance is optimized for small to medium scale matrices, i.e., for sizes up to a few hundred. BLASFEO comes with three different implementations: a high-performance implementation aiming at providing the highest performance for matrices fitting in cache, a reference implementation providing portability and embeddability and optimized for very small matrices, and a wrapper to standard BLAS and LAPACK providing high-performance on large matrices. The three implementations of BLASFEO together provide high-performance dense linear algebra routines for matrices ranging from very small to large. Compared to both open-source and proprietary highly-tuned BLAS libraries, for matrices of size up to about one hundred the high-performance implementation of BLASFEO is about 20-30% faster than the corresponding level 3 BLAS routines and 2-3 times faster than the corresponding LAPACK routines.

Mathematical Software