Yevgen Voronenko and Markus Püschel (Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 5, pp. V-101-V-104, 2004)
Automatic Generation of Implementations for DSP Transforms on Fused Multiply-Add Architectures
Preprint (95 KB)
Published paper (link to publisher)

Many modern computer architectures feature fused multiply-add (FMA) instructions, which offer potentially faster performance for numerical applications. For DSP transforms, compilers can only generate FMA code to a very limited extent because optimal use of FMAs requires modifying the chosen algorithm. In this paper we present a framework for automatically generating FMA code for every linear DSP transform, which we implemented as an extension to the SPIRAL code generation system. We show that for many transforms and transform sizes, our generated FMA code matches the best-known hand-derived FMA algorithms in terms of arithmetic cost. Further, we present actual runtime results that show the speed-up obtained by using FMA instructions.

SPL compiler: Translating math into code

More information:

Online program generator for transforms