Srinivas Chellappa, Franz Franchetti and Markus Püschel (Proc. International Conference on Supercomputing (ICS), pp. 26-35, 2009)
Computer Generation of Fast Fourier Transforms for the Cell Broadband Engine
Preprint (238 KB)
Published paper (link to publisher)

The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has an impressive floating point peak performance, programming and optimizing for it is difficult as it requires explicit memory management, multithreading, streaming, and vectorization. We address this problem for the discrete Fourier transform (DFT) by extending Spiral, a program generation system, to automatically generate highly optimized implementations for the Cell. The extensions include multi-SPE parallelization and explicit memory streaming, both performed at a high abstraction level using rewriting systems operating on Spiral's internal domain-specific language. Further, we support latency and throughput optimizations, single and double precision, and different data formats. The performance of Spiral's computer generated code is comparable with and sometimes better than existing DFT implementations, where available.

Multithreading, SPIRAL program generation system for transforms, Cell BE Processor, Discrete/fast Fourier transform