Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Franz Franchetti and Markus Püschel (Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2003)
Short Vector Code Generation for the Discrete Fourier Transform
Preprint (169 KB)
In this paper we use a mathematical approach to generate high performance short vector code for the discrete Fourier transform. We represent the well-known Cooley-Tukey FFT recursion in a mathematical notation and formally derive a ``short vector variant''. Using this recursion we generate for one DFT size a large number of different algorithms, represented as formulas, and translate them into short vector code. Finally, we present a vector code specific dynamic programming search that finds in the formula space a good match to the given architecture. We implemented this approach as part of the SPIRAL library generator. On Pentium III and 4, we automatically generate SSE and SSE2 vector code that compares favorably with the hand-tuned Intel vendor MKL.Keywords: SIMD vectorization, Discrete/fast Fourier transform