Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to incorporate both forms of parallelism and various data layouts to enable some of the parallelism. However, in the era of exascale, where systems have thousand of nodes and intricate network topologies, flexibility and parallel efficiency are key aspects all multi-dimensional DFT frameworks need to have in order to map and scale the computation appropriately. In this work, we present a flexible framework, built on the Redistribution Operations and Tensor Expressions (ROTE) framework, that facilitates the development of a family of parallel multi-dimensional DFT algorithms by 1) unifying the two parallelization schemes within a single framework, 2) exploiting the two different parallelization schemes to different degrees and 3) using different data layouts to distribute the data across the compute nodes. We demonstrate the need of a versatile framework and thus a need for a family of parallel multi-dimensional DFT algorithms on the K-Computer, where we show almost linear strong scaling results for problem sizes of 1024^3 on 32k compute nodes.Keywords: Discrete/fast Fourier transform, Multidimensional DFT, Parallel processing