We present SLinGen, a program generation system for linear algebra. The input to SLinGen is an application expressed mathematically in a linear-algebra-inspired language (LA) that we define. LA provides basic scalar/vector/matrix additions/multiplications and higher level operations including linear systems solvers, Cholesky and LU factorizations. The output of SLinGen is performance-optimized single-source C code, optionally vectorized with intrinsics. The target of SLinGen are small-scale computations on fixed-size operands, for which a straightforward implementation using optimized libraries (e.g., BLAS or LAPACK) is known to yield suboptimal performance (besides increasing code size and introducing dependencies), but which are crucial in control, signal processing, computer vision, and other domains. Internally, SLinGen uses synthesis and DSL-based techniques to optimize at a high level of abstraction. We benchmark our program generator on three prototypical applications: the Kalman filter, Gaussian process regression, and an L1-analysis convex solver, as well as basic routines including Cholesky factorization and solvers for the continuous-time Lyapunov and Sylvester equations. The results show significant speed-ups compared to straightforward C with Intel icc and clang with a polyhedral optimizer, as well as library-based and template-based implementations.
In previous work carried out in the setting of program algebra, including work in the area of instruction sequence size complexity, we chose instruction sets for Boolean registers that contain only instructions of a few of the possible kinds. In the current paper, we study instruction sequence size bounded functional completeness of all possible instruction sets for Boolean registers. We expect that the results of this study will turn out to be useful to adequately assess results of work that is concerned with lower bounds of instruction sequence size complexity.
A parameterized algebraic theory of instruction sequences, objects that represent the behaviours produced by instruction sequences under execution, and objects that represent the behaviours exhibited by the components of the execution environment of instruction sequences is the basis of a line of research in which issues relating to a wide variety of subjects from computer science have been rigorously investigated thinking in terms of instruction sequences. In various papers that belong to this line of research, use is made of an instantiation of this theory in which the basic instructions are instructions to read out and alter the content of Boolean registers and the components of the execution environment are Boolean registers. In this paper, we give a simplified presentation of the most general such instantiated theory.
Tensor kernels in machine learning (ML) often correspond to pure mathematical expressions, making term rewriting an attractive strategy for optimization and mapping to specialized hardware accelerators. However, existing ML intermediate representations (IRs) tend to either be textit{pure but high-level}, making low-level rewrites to hardware targets inexpressible, or textit{low-level but impure}, hampering the use of term rewriting altogether. This paper introduces Glenside, a pure IR whose core abstraction -- the textit{access pattern} -- enables low-level, layout-aware, hardware-centric program rewrites. We demonstrate how term rewriting in Glenside can be used to map program fragments to hardware accelerator invocations and automatically discover classic data layout transformations like texttt{im2col}. Glenside establishes a new foundation for exploring further term rewriting techniques in optimizing low-level tensor programs.