Franz Franchetti and Markus Püschel (Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2003)
Short Vector Code Generation for the Discrete Fourier Transform
Preprint (169 KB)

In this paper we use a mathematical approach to generate high performance short vector code for the discrete Fourier transform. We represent the well-known Cooley-Tukey FFT recursion in a mathematical notation and formally derive a ``short vector variant''. Using this recursion we generate for one DFT size a large number of different algorithms, represented as formulas, and translate them into short vector code. Finally, we present a vector code specific dynamic programming search that finds in the formula space a good match to the given architecture. We implemented this approach as part of the SPIRAL library generator. On Pentium III and 4, we automatically generate SSE and SSE2 vector code that compares favorably with the hand-tuned Intel vendor MKL.

SIMD vectorization, Discrete/fast Fourier transform