Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Daniele G. Spampinato (PhD. thesis, Computer Science, ETH Zurich, Switzerland, 2017)

**A Linear Algebra Compiler for Small Problem Sizes**

Published paper (link to publisher)

Bibtex

Many applications in media processing, control, graphics, and other domains require efficient small-scale linear algebra computations. However, most existing high performance libraries for linear algebra, such as ATLAS or Intel MKL are more geared towards large-scale problems (matrix sizes in the hundreds and larger), specific interfaces (e.g., BLAS and LAPACK), and cannot specialize to problem instances. In other domains, program generators have proven effective to automatically produce specialized code for important building blocks. An example is the Spiral generator for linear transforms and its extension to support other functionalities, such as matrix-matrix multiplication and Viterbi decoding. Learning from both libraries and generators, in this dissertation we aim for the automatic generation of fast code for small scale, linear algebra computations. We introduce a program synthesis framework for small-scale, linear algebra computations of fixed size. These include computations on matrices, vectors, and scalars using basic operators as well as higher-level computations such as the Cholesky decomposition, solvers for the continuous-time Lyapunov and Sylvester equations, and the inversion of triangular matrices. The input to our framework is a linear algebra program composed of several fixed size basic and higher-level computations; the output is a corresponding C function optionally including intrinsics to efficiently use SIMD vector extensions. In our framework, support for for small-scale, basic linear algebra computations is provided by the LGen compiler. LGen generates code using two levels of mathematical domain-specific languages (DSLs). The DSLs are used to perform tiling, loop fusion, and vectorization at a high level of abstraction, before the final code is generated. In particular, LGen’s vectorization approach can easily extend to different vector ISAs. In addition, search is used to select among alternative generated implementations. LGen also supports computations whose matrices have structure, such as symmetric or triangular, that reduces the cost of the computation. For example, dense linear systems of equations are solved by first reducing to triangular form and problems in optimization may yield matrices with different kinds of structures. The BLAS interface provides a small set of structured matrix computations, chosen to serve a certain set of higher-level functions supported by LAPACK. However, if a given computation contains a structure that is not supported by standard interfaces then its computation using a more general version of it (e.g., multiplying two general matrices instead of two structured ones) would lose the benefits of the structure. We address this problem by combining the LGen compiler with techniques from polyhedral compilation to mathematiiii cally capture matrix structures. In this dissertation, we consider triangular and symmetric matrices and discuss the approach extensibility to a much larger set including blocked structures. Finally, support for higher-level linear algebra computations is included building on LGen and the FLAME-based algorithm synthesis tool Cl1ck. When a higher-level computation is encountered in an input linear algebra program, a set of basic computations is synthetized starting from one of its algorithmic definitions. If desired, the framework’s autotuner explores alternative algorithmic variants to select the one that yields the best performance. We evaluate the performance of the code generated with our framework for several linear algebra computations, including library compliant and noncompliant basic and higher-level computations and the computation of a Kalman filter. Experimental results show that our framework produces code that performs in many cases considerably better than well-established, commercial and non-commercial libraries, prior software generators, and compilers.

SIMD vectorization, Linear algebra, Optimizing, Program generation, Structured matrices, Polyhedral Model