Daisuke Takahashi and Franz Franchetti (Proc. International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia), pp. 114-122, 2020)
FFTE on SVE: SPIRAL-Generated Kernels
Published paper (link to publisher)

In this paper we propose an implementation of the fast Fourier transform (FFT) targeting the ARM Scalable Vector Extension (SVE). We performed automatic vectorization via a compiler and an explicit vectorization through code generation by SPIRAL for FFT kernels, and compared the performance. We show that the explicit vectorization of SPIRAL generated code improves performance significantly. Performance results of FFTs on RIKEN's Fugaku processor simulator are reported. With the ARM compiler SPIRAL-generated FFT kernels written in SVE intrinsic are up to 3.16 times faster than FFT kernels of FFTE written in Fortran and up to 5.62 times faster than SPIRAL-generated FFT kernels written in C.