Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Franz Franchetti and Markus Püschel (Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. II-17-II-20, 2007)
SIMD Vectorization of Non-Two-Power Sized FFTs
Preprint (197 KB)
Published paper (link to publisher)
Bibtex
SIMD (single instruction multiple data) vector instructions, such as Intel’s SSE family, are available on most architectures, but are difficult to exploit for speed-up. In many cases, such as the fast Fourier transform (FFT), signal processing algorithms have to undergo major transformations to map efficiently. Using the Kronecker product formalism, we rigorously derive a novel variant of the general-radix Cooley-Tukey FFT that is structured to map efficiently for any vector length nu and radix. Then, we include the new FFT into the program generator Spiral to generate actual C implementations. Benchmarks on Intel’s SSE show that the new algorithms perform better on practically all sizes than the best available libraries Intel’s MKL and FFTW.
Keywords: SIMD vectorization, Discrete/fast Fourier transformMore information: