ﻻ يوجد ملخص باللغة العربية
BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization and other applications targeting relatively small matrices. BLASFEO defines an API which uses a packed matrix format as its native format. This format is analogous to the internal memory buffers of optimized BLAS, but it is exposed to the user and it removes the packing cost from the routine call. For matrices fitting in cache, BLASFEO outperforms optimized BLAS implementations, both open-source and proprietary. This paper investigates the addition of a standard BLAS API to the BLASFEO framework, and proposes an implementation switching between two or more algorithms optimized for different matrix sizes. Thanks to the modular assembly framework in BLASFEO, tailored linear algebra kernels with mixed column- and panel-major arguments are easily developed. This BLAS API has lower performance than the BLASFEO API, but it nonetheless outperforms optimized BLAS and especially LAPACK libraries for matrices fitting in cache. Therefore, it can boost a wide range of applications, where standard BLAS and LAPACK libraries are employed and the matrix size is moderate. In particular, this paper investigates the benefits in scientific programming languages such as Octave, SciPy and Julia.
BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the co
Mathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks to the ad
Matrix-matrix multiplication is a fundamental operation of great importance to scientific computing and, increasingly, machine learning. It is a simple enough concept to be introduced in a typical high school algebra course yet in practice important
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific ke
Updating a linear least squares solution can be critical for near real-time signalprocessing applications. The Greville algorithm proposes a simple formula for updating the pseudoinverse of a matrix A $in$ R nxm with rank r. In this paper, we explici