Mark Blanco, Tze-Meng Low and Kyungjoo Kim (Proc. High Performance Extreme Computing (HPEC), 2019)
Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU
Preprint (735 KB)

In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.26- 1.48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.

Linear algebra, Parallel processing, CPUs, High performance, GPUs, Performance portable, Algorithm, K-truss, Graph-algorithms, Kokkos, Eager K-truss