No Arabic abstract
We present SLinGen, a program generation system for linear algebra. The input to SLinGen is an application expressed mathematically in a linear-algebra-inspired language (LA) that we define. LA provides basic scalar/vector/matrix additions/multiplications and higher level operations including linear systems solvers, Cholesky and LU factorizations. The output of SLinGen is performance-optimized single-source C code, optionally vectorized with intrinsics. The target of SLinGen are small-scale computations on fixed-size operands, for which a straightforward implementation using optimized libraries (e.g., BLAS or LAPACK) is known to yield suboptimal performance (besides increasing code size and introducing dependencies), but which are crucial in control, signal processing, computer vision, and other domains. Internally, SLinGen uses synthesis and DSL-based techniques to optimize at a high level of abstraction. We benchmark our program generator on three prototypical applications: the Kalman filter, Gaussian process regression, and an L1-analysis convex solver, as well as basic routines including Cholesky factorization and solvers for the continuous-time Lyapunov and Sylvester equations. The results show significant speed-ups compared to straightforward C with Intel icc and clang with a polyhedral optimizer, as well as library-based and template-based implementations.
Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK, and while such libraries provide portable performance for a wide range of computing architectures, they still present limitations in terms of flexibility. We advocate a domain-specific program generator capable of producing library routines tailored to the specific needs of the application in terms of sizes, interface, and target architecture.
In previous work carried out in the setting of program algebra, including work in the area of instruction sequence size complexity, we chose instruction sets for Boolean registers that contain only instructions of a few of the possible kinds. In the current paper, we study instruction sequence size bounded functional completeness of all possible instruction sets for Boolean registers. We expect that the results of this study will turn out to be useful to adequately assess results of work that is concerned with lower bounds of instruction sequence size complexity.
A parameterized algebraic theory of instruction sequences, objects that represent the behaviours produced by instruction sequences under execution, and objects that represent the behaviours exhibited by the components of the execution environment of instruction sequences is the basis of a line of research in which issues relating to a wide variety of subjects from computer science have been rigorously investigated thinking in terms of instruction sequences. In various papers that belong to this line of research, use is made of an instantiation of this theory in which the basic instructions are instructions to read out and alter the content of Boolean registers and the components of the execution environment are Boolean registers. In this paper, we give a simplified presentation of the most general such instantiated theory.