Franz Franchetti and Markus PŁschel (Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. II-17-II-20, 2007)
SIMD Vectorization of Non-Two-Power Sized FFTs
Preprint (197 KB)
Published paper (link to publisher)

SIMD (single instruction multiple data) vector instructions, such as Intelís SSE family, are available on most architectures, but are difficult to exploit for speed-up. In many cases, such as the fast Fourier transform (FFT), signal processing algorithms have to undergo major transformations to map efficiently. Using the Kronecker product formalism, we rigorously derive a novel variant of the general-radix Cooley-Tukey FFT that is structured to map efficiently for any vector length nu and radix. Then, we include the new FFT into the program generator Spiral to generate actual C implementations. Benchmarks on Intelís SSE show that the new algorithms perform better on practically all sizes than the best available libraries Intelís MKL and FFTW.

SIMD vectorization, Discrete/fast Fourier transform

More information:

Online program generator for transforms