Tom Henretty, Richard Veras, Franz Franchetti, Louis-NoŽl Pouchet, J. Ramanujam and P. Sadayappan (Proc. ACM International Conference on Supercomputing , pp. 13-24, 2013)
A Stencil Compiler for Short-Vector SIMD Architectures
Preprint (415 KB)
Published paper (link to publisher)

Stencil computations are an integral component of applications in a number of scientific computing domains. Short-vector SIMD instruction sets are ubiquitous on modern processors and can be used to significantly increase the performance of stencil computations. Traditional approaches to optimizing stencils on these platforms have focused on either short-vector SIMD or data locality optimizations. In this paper, we propose a domain-specific language and compiler for stencil computations that allows specification of stencils in a concise manner and automates both locality and short-vector SIMD optimizations, along with effective utilization of multi-core parallelism. Loop transformations to enhance data locality and enable load-balanced parallelism are combined with a data layout transformation to effectively increase the performance of stencil computations. Performance increases are demonstrated for a number of stencils on several modern SIMD architectures.

SIMD vectorization, Short vector, Compiler