Qi Guo, N. Alachiotis, Berkin Akin, F. Sadi, G. Xu, Tze-Meng Low, Lawrence Pileggi, James C. Hoe and Franz Franchetti (Proc. Workshop on Near Data Processing (WONDP), 2014)
3D-Stacked Memory-Side Acceleration: Accelerator and System Design
Comment: in conjunction with MICRO-47
Preprint (910 KB)
Published paper (link to publisher)

Specialized hardware acceleration is an effective technique to mitigate the dark silicon problems. A challenge in designing on-chip hardware accelerators for data-intensive applications is how to efficiently transfer data between the memory hierarchy and the accelerators. Although the Processingin-Memory (PIM) technique has the potential to reduce the overhead of data transfers, it is limited by the traditional process technology. Recent process technology advancements such as 3Ddie stacking enable efficient PIM architectures by integrating accelerators to the logic layer of 3D DRAM, thus leading to the concept of the 3D-stacked Memory-Side Accelerator (MSA). In this paper, we initially present the overall architecture of the 3D-stacked MSA, which relies on a configurable array of domain-specific accelerators. Thereafter, we describe a full-system prototype that is built upon a novel software stack and a hybrid evaluation methodology. Experimental results demonstrate that the 3D-stacked MSA achieves up to 179x and 96x better energy efficiency than the Intel Haswell processor for the FFT and matrix transposition algorithms, respectively.

Memory, Acceleration