Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Franz Franchetti, Yevgen Voronenko and G. Almasi (Proc. High Performance Computing for Computational Science (VECPAR), 2012)
Automatic Generation of the HPC Challenges Global FFT Benchmark for BlueGene/P
Preprint (151 KB)
Published paper (link to publisher)
We present the automatic synthesis of the HPC Challenge’s Global FFT, a large 1D FFT across a whole supercomputer system.We extend the Spiral system to synthesize specialized single-node FFT libraries that combine a data layout transformation with the actual on-node FFT computation to improve the network performance through enabling all-to-all collectives. We run our optimized Global FFT benchmark on up to 128k cores (32 racks) of ANL’s BlueGene/P “Intrepid” and achieved 6.4 Tflop/s, outperforming ANL’s 2008 HPC Challenge Class I Global FFT run (5 Tflop/s). Our code was part of IBM’s winning 2010 HPC Challenge Class II submission. Further, we show first single-thread results on BlueGene/Q.Keywords: Fast Fourier Transform, BlueGene, Automatic generation