Franz Franchetti, Yevgen Voronenko and Markus Püschel (Proc. Programming Languages Design and Implementation (PLDI), pp. 315-326 , 2005)
Formal Loop Merging for Signal Transforms
Preprint (304 KB)
Published paper (link to publisher)
Bibtex

A critical optimization in the domain of linear signal transforms, such as the discrete Fourier transform (DFT), is loop merging, which increases data locality and reuse and thus performance. In particular, this includes the conversion of shuffle operations into array reindexings. To date, loop merging is well understood only for the DFT, and only for Cooley-Tukey FFT based algorithms, which excludes DFT sizes divisible by large primes. In this paper, we present a formal loop merging framework for general signal transforms and its implementation within the SPIRAL code generator. The framework consists of Sigma-SPL, a mathematical language to express loops and index mappings; a rewriting system to merge loops in Sigma-SPL; and a compiler that translates Sigma-SPL into code. We apply the framework to DFT sizes that cannot be handled using only the Cooley-Tukey FFT and compare our method to FFTW 3.0.1 and the vendor library Intel MKL 7.2.1. Compared to FFTW our generated code is a factor of 2-4 faster under equal implementation conditions (same algorithms, same unrolling threshold). For some sizes we show a speed-up of a factor of 9 using Bluestein's algorithm. Further, we give a detailed comparison against the Intel vendor library MKL; our generated code is between 2 times faster and 4.5 times slower.

Keywords:
Learn the current Spiral system, SPL compiler: Translating math into code, Discrete/fast Fourier transform

More information:

Online program generator for transforms