Pinit Kumhom, Jeremy Johnson and Prawat Nagvajara (Proc. IEEE ASIC/SOC Conference, IEEE, pp. 182-186, 2000)
Design, optimization, and implementation of a universal FFT processor
Published paper (link to publisher)

There exist Fast Fourier transform (FFT) algorithms, called dimensionless FFTs, that work independent of dimension. These algorithms can be configured to compute different dimensional DFTs simply by relabeling the input data and by changing the values of the twiddle factors occurring in the butterfly operations. This observation allows us to design an FFT processor, which with minor reconfiguring, can compute one, two, and three dimensional DFTs. In this paper we design a family of FFT processors, parameterized by the number of points, the dimension, the number of processors, and the internal dataflow, and show how to map different dimensionless FFTs onto this hardware design. Different dimensionless FFTs have different dataflows and consequently lead to different performance characteristics. Using a performance model we search for the optimal algorithm for the family of processors we considered. The resulting algorithm and corresponding hardware design was implemented using FPGA

Search/Learning for optimization, Dimensionaless FFT, IP cores for FPGA/ASIC, Multidimensional DFT