Franz Franchetti and Markus Püschel (Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. II-17-II-20, 2007)
SIMD Vectorization of Non-Two-Power Sized FFTs
Preprint (197 KB)
Published paper (link to publisher)
Bibtex

SIMD (single instruction multiple data) vector instructions, such as Intel’s SSE family, are available on most architectures, but are difficult to exploit for speed-up. In many cases, such as the fast Fourier transform (FFT), signal processing algorithms have to undergo major transformations to map efficiently. Using the Kronecker product formalism, we rigorously derive a novel variant of the general-radix Cooley-Tukey FFT that is structured to map efficiently for any vector length nu and radix. Then, we include the new FFT into the program generator Spiral to generate actual C implementations. Benchmarks on Intel’s SSE show that the new algorithms perform better on practically all sizes than the best available libraries Intel’s MKL and FFTW.

Keywords:
SIMD vectorization, Discrete/fast Fourier transform

More information:

Online program generator for transforms