Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Victoria Caparrós Cabezas and Markus Püschel (Proc. IEEE International Symposium on Workload Characterization (IISWC), pp. 222-231, 2014)
Extending the Roofline Model: Bottleneck Analysis with Microarchitectural Constraints
Preprint (2.1 MB)
Published paper (link to publisher)
Software, even if carefully optimized, rarely reaches the peak performance of a processor. Understanding which hardware resource is the bottleneck is difficult but important as it can help with both further optimizing the code or deciding which hardware component to upgrade for higher performance. If the bottleneck is the memory bandwidth, the roofline model provides a simple but instructive analysis and visualization. In this paper, we take the roofline analysis further by including additional performance-relevant hardware features including latency, throughput, and capacity information for a multilevel cache hierarchy and out-of-order execution buffers. Two key ideas underlie our analysis. First, we estimate performance based on a scheduling of the computation DAG on a high-level model of the microarchitecture and extract data including utilization of resources and overlaps from a cycle-by-cycle analysis of the schedule. Second, we show how to use this data to create only one plot with multiple rooflines that visualize bottlenecks. We validate our model against performance data obtained from a real system, and then apply our bottleneck analysis to a number of prototypical floating point kernels to identify and interpret bottlenecks.Keywords: Performance model, Roofline model, Bottleneck analysis