Franz Franchetti, Yevgen Voronenko and Markus Püschel (Proc. Supercomputing (SC), 2006)
FFT Program Generation for Shared Memory: SMP and Multicore
Preprint (161 KB)
Published paper (link to publisher)

The chip maker's response to the approaching end of CPU frequency scaling are multicore systems, which offer the same programming paradigm as traditional shared memory platforms but have different performance characteristics. This situation considerably increases the burden on library developers and strengthens the case for automatic performance tuning frameworks like Spiral, a program generator and optimizer for linear transforms such as the discrete Fourier transform (DFT). We present a shared memory extension of Spiral. The extension within Spiral consists of a rewriting system that manipulates the structure of transform algorithms to achieve load balancing and avoids false sharing, and of a backend to generate multithreaded code. Application to the DFT produces a novel class of algorithms suitable for multicore systems as validated by experimental results: we demonstrate a parallelization speed-up already for sizes that fit into L1 cache and compare favorably to other DFT libraries across all small and midsize DFTs and considered platforms.

Learn the current Spiral system, Multithreading, SPIRAL program generation system for transforms, Discrete/fast Fourier transform

More information:

Online program generator for transforms