Tom Henretty, Kevin Stock, Louis-NoŽl Pouchet, Franz Franchetti, J. Ramanujam and P. Sadayappan (Proc. International Conference on Compiler Construction (CC), 2011)
Data Layout Transformation for Stencil Computations on Short SIMD Architectures
Preprint (430 KB)
Published paper (link to publisher)

Stencil computations are at the core of applications in many domains such as computational electromagnetics, image processing, and partial differential equation solvers used in a variety of scientific and engineering applications. Short-vector SIMD instruction sets such as SSE and VMX provide a promising and widely available avenue for enhancing performance on modern processors. However a fundamental memory stream alignment issue limits achieved performance with stencil computations on modern short SIMD architectures. In this paper, we propose a novel data layout transformation that avoids the stream alignment conflict, along with a static analysis technique for determining where this transformation is applicable. Significant performance increases are demonstrated for a variety of stencil codes on several modern processors with SIMD capabilities.

SIMD vectorization, Architecture, Stencil computations