F. Sadi, Lawrence Pileggi and Franz Franchetti (High Performance Extreme Computing Conference (HPEC), 2016)
3D DRAM Based Application Specific Hardware Accelerator for SpMV
Preprint (709 KB)
Published paper (link to publisher)

For numerous scientific applications Sparse Matrix-Vector multiplication (SpMV) is one of the most important kernels. Unfortunately, due to its very low ratio of computation to memory access SpMV is inherently a memory bound problem. On the other hand, the main memory bandwidth of commercial off-the-shelf (COTS) architectures is insufficient for available computation resources on these platforms, well known as the memory wall problem. As a result, COTS architectures are unsuitable for SpMV. Furthermore, SpMV requires random access into a memory space which is far too big for cache. Hence, it becomes difficult to utilize the main memory bandwidth which is already scarce. We propose an algorithm for large SpMV problems which is specially optimized to fully exploit the underlying micro-architecture and overall system capabilities. This algorithm is implemented in two steps. The key feature of the first step is that it converts all the main memory random access into streaming access. This reduces the overall data transfer volume significantly and ensures full utilization of the memory bandwidth. On top of that, we propose a meta-data compression technique, namely Variable Length Delta Index (VLDI), to decrease the data transfer volume even further. VLDI is particularly effective for sparse matrices where meta-data to payload ratio is high, e.g. sparse bit matrices.

Acceleration, 3D-stacked