No Arabic abstract
In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over GF(2^e) for 2 <= 2 <= 10. As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over GF(2^e) for e small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over GF(p^k) to a series of matrix multiplications over GF(p). Furthermore, we propose a caching technique - Newton-John tables - to avoid finite field multiplications which is inspired by Kronrods method (M4RM) for matrix multiplication over GF(2). Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over GF(2^e) with 2 <= e <= 10.
In this paper, a class of permutation trinomials of Niho type over finite fields with even characteristic is further investigated. New permutation trinomials from Niho exponents are obtained from linear fractional polynomials over finite fields, and it is shown that the presented results are the generalizations of some earlier works.
BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the computational performance is optimized for small to medium scale matrices, i.e., for sizes up to a few hundred. BLASFEO comes with three different implementations: a high-performance implementation aiming at providing the highest performance for matrices fitting in cache, a reference implementation providing portability and embeddability and optimized for very small matrices, and a wrapper to standard BLAS and LAPACK providing high-performance on large matrices. The three implementations of BLASFEO together provide high-performance dense linear algebra routines for matrices ranging from very small to large. Compared to both open-source and proprietary highly-tuned BLAS libraries, for matrices of size up to about one hundred the high-performance implementation of BLASFEO is about 20-30% faster than the corresponding level 3 BLAS routines and 2-3 times faster than the corresponding LAPACK routines.
In this paper, we investigate the power functions $F(x)=x^d$ over the finite field $mathbb{F}_{2^{4n}}$, where $n$ is a positive integer and $d=2^{3n}+2^{2n}+2^{n}-1$. It is proved that $F(x)=x^d$ is APcN at certain $c$s in $mathbb{F}_{2^{4n}}$, and it is the second class of APcN power functions over finite fields of even characteristic. Further, the $c$-differential spectrum of these power functions is also determined.
Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK, and while such libraries provide portable performance for a wide range of computing architectures, they still present limitations in terms of flexibility. We advocate a domain-specific program generator capable of producing library routines tailored to the specific needs of the application in terms of sizes, interface, and target architecture.
The level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently captured by high-level languages and libraries such as Matlab and Eigen, while the latter expresses the kernels included in the BLAS and LAPACK libraries. Unfortunately, the translation from a high-level computation to an efficient sequence of kernels is a task, far from trivial, that requires extensive knowledge of both linear algebra and high-performance computing. Internally, almost all high-level languages and libraries use efficient kernels; however, the translation algorithms are too simplistic and thus lead to a suboptimal use of said kernels, with significant performance losses. In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels. In 25 application problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10x.