Jeremy Johnson, Tim Chagnon, Petya Vachranukunkiet, Prawat Nagvajara and Chika Nwankpa (Proc. International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), 2008)
Sparse LU Decomposition using FPGA
Preprint (539 KB)
Published paper (link to publisher)
Bibtex

This paper reports on an FPGA implementation of sparse LU decomposition. The resulting special purpose hardware is geared towards power system problems - load flow computation - which are typically solved iteratively using Newton Raphson. The key step in this process, which takes approximately 85% of the computation time, is the solution of sparse linear systems arising from the Jacobian matrices that occur in each iteration of Newton Raphson. Current state-of-the-art software packages, such as UMFPACK and SuperLU, running on general purpose processors perform suboptimally on these problems due to poor utilization of the floating point hardware (typically 1 to 4% efficiency). Our LU hardware, using a special purpose data path and cache, designed to keep the floating point hardware busy, achieves an efficiency of 60% and higher. This improved efficiency provides an order of magnitude speedup when compared to a software solution using UMFPACK running on general purpose processors.

Keywords:
IP cores for FPGA/ASIC, Beyond transforms, Linear algebra