Basilio B. Fraguela, Yevgen Voronenko and Markus Püschel (Proc. Parallel Architectures and Compilation Techniques (PACT), pp. 271-280, 2009)
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling
Preprint (2.7 MB)

Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms such as fast Fourier transforms (FFTs) present a far more complex search space consisting of many thousands of different implementations with very different complex access patterns and nesting and code structures. As a results, some of the best available FFT implementations use heuristic search based on runtime measurements. In this paper we present the first analytical model that can successfully replace the measurement in this search on modern platforms. The model includes many details of the platform's memory system including the TLBs, and, for the first time, physically addressed caches and hardware prefetching. The effect, as we show, is a dramatically reduced search time to find the best FFT without significant loss in performance. Even though our model is adapted to the FFT in this paper, its underlying structure should be applicable for a much larger set of code structures and hence is a candidate for iterative compilation.

Discrete/fast Fourier transform, Performance model