Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Franz Franchetti, Stefan Kral, Juergen Lorenz, Markus PŁschel, Christoph W. Ueberhuber and Peter Wurzinger (Proc. High Performance Computing for Computational Science (VECPAR), Lecture Notes in Computer Science, Springer, Vol. 3402, pp. 23-36, 2004)
Automatically Tuned FFTs for BlueGene/Lís Double FPU
Preprint (219 KB)
IBM's upcoming 360 Tflop/s supercomputer BlueGene/L featuring 65,536 processors is supposed to lead the Top 500 list when being installed in 2005. This paper presents one of the first numerical codes actually run on a small prototype of this machine. Formal vectorization techniques, the Vienna MAP vectorizer (both developed for generic short vector SIMD extensions), and the automatic performance tuning approach provided by Spiral are combined to generate automatically optimized FFT codes for the BlueGene/L machine targeting its two-way short vector SIMD ``double'' floating-point unit. The resulting FFT codes are 40% faster than the best scalar Spiral generated code and 5 times faster than the mixed-radix FFT implementation provided by the Gnu scientific library GSL.Keywords: Discrete/fast Fourier transform