Franz Franchetti and Markus Püschel (Proc. International Conference on Compiler Construction (CC), Lecture Notes in Computer Science, Springer, Vol. 4959, pp. 116-131, 2008)
Generating SIMD Vectorized Permutations
Preprint (244 KB)
Published paper (link to publisher)
Bibtex

This paper introduces a method to generate efficient vectorized implementations of small stride permutations using only vector load and vector shuffle instructions. These permutations are crucial for high-performance numerical kernels including the fast Fourier transform. Our generator takes as input only the specification of the target platform's SIMD vector ISA and the desired permutation. The basic idea underlying our generator is to model vector instructions as matrices and sequences of vector instructions as matrix formulas using the Kronecker product formalism. We design a rewriting system and a search mechanism that applies matrix identities to generate those matrix formulas that have vector structure and minimize a cost measure that we define. The formula is then translated into the actual vector program for the specified permutation. For three important classes of permutations, we show that our method yields a solution with the minimal number of vector shuffles. Inserting into a fast Fourier transform yields a significant speedup.

Keywords:
Learn the current Spiral system, SPL compiler: Translating math into code, SIMD vectorization, SPIRAL program generation system for transforms, Discrete/fast Fourier transform