Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Yevgen Voronenko and Markus Püschel (Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 5, pp. V-101-V-104, 2004)
Automatic Generation of Implementations for DSP Transforms on Fused Multiply-Add Architectures
Preprint (95 KB)
Published paper (link to publisher)
Bibtex
Many modern computer architectures feature fused multiply-add (FMA) instructions, which offer potentially faster performance for numerical applications. For DSP transforms, compilers can only generate FMA code to a very limited extent because optimal use of FMAs requires modifying the chosen algorithm. In this paper we present a framework for automatically generating FMA code for every linear DSP transform, which we implemented as an extension to the SPIRAL code generation system. We show that for many transforms and transform sizes, our generated FMA code matches the best-known hand-derived FMA algorithms in terms of arithmetic cost. Further, we present actual runtime results that show the speed-up obtained by using FMA instructions.
Keywords: SPL compiler: Translating math into codeMore information: