Thom Popovici, Franz Franchetti and Tze-Meng Low (Proc. High Performance Extreme Computing (HPEC), IEEE, pp. 1-7, 2017)
Mixed Data Layout Kernels for Vectorized Complex Arithmetic
Preprint (414 KB)
Published paper (link to publisher)

Implementing complex arithmetic routines with Single Instruction Multiple Data (SIMD) instructions requires the use of instructions that are usually not found in their real arithmetic counter-parts. These instructions, such as shuffles and addsub, are often bottlenecks for many complex arithmetic kernels as modern architectures usually can perform more real arithmetic operations than execute instructions for complex arithmetic. In this work, we focus on using a variety of data layouts (mixed format) for storing complex numbers at different stages of the computation so as to limit the use of these instructions. Using complex matrix multiplication and Fast Fourier Transforms (FFTs) as our examples, we demonstrate that performance improvements of up to 2x can be attained with mixed format within the computational routines. We also described how existing algorithms can be easily modified to implement the mixed format complex layout.

Numerical kernels we consider