Daniel McFarlin, Franz Franchetti, José M. F. Moura and Markus Püschel (Proc. SPIE Conference on Defense, Security, and Sensing, Proceedings of SPIE, Vol. 7337, pp. 733708, 2009)
High Performance Synthetic Aperture Radar Image Formation On Commodity Architectures
Published paper (link to publisher)

Synthetic Aperture Radar (SAR) image processing platforms are increasingly confronted with vast datasets and hard real-time deadlines. Platform developers are also finding themselves constrained by battlefield exigencies, time-to-market pressures and rapidly varying requirements. In response, developers are coupling high performance, general-purpose Commercial-Off-The-Shelf (COTS) architectures with software implementations of SAR algorithms. While this approach provides great flexibility, achieving the requisite performance on COTS architectures such as IBM's Cell BE and Intel's Core2 Quad is a potentially time-consuming and error-prone process. This is principally due to the highly parallel nature of modern COTS architectures. Developers must now grapple with architectures that exhibit parallelism at nearly every level of granularity; extracting high performance requires a complex interweaving of multiple forms of parallelism (e.g., SIMD vector extensions, multicore). In this paper, we first present an overview of SPIRAL, a program generator that reduces the burden of developing high performance code by automatically exploring many combinations of parallelism. We then extend SPIRAL's domain-specific language to expose high-level SAR constructs that a developer can combine into a parameterized scenario. We subsequently employ these extensions to generate code for the computationally dominant image formation component of Polar Format SAR processing. By fully leveraging automatic program generation, SPIRAL produces code for Intel's Core2 Quad that surpasses competing hand-tuned implementations on the Cell Blade, an architecture with twice as many cores and three times the memory bandwidth. We show an average performance of 34 Gigaflops/sec for 16-Megapixel and 100-Megapixel SAR images with runtimes of 0.6 and 4.45 seconds respectively.

Multithreading, SIMD vectorization, Numerical kernels we consider, Beyond transforms, Parallel processing