A tool for bottleneck analysis and performance prediction for GPU-accelerated applications

Abstract

High-level tools for analyzing and predicting the performance GPU-accelerated applications are scarce, at best. Although performance modeling approaches for GPUs exist, their complexity makes them virtually impossible to use to quickly analyze the performance of real life applications and obtain easy-to-use, readable feedback. This is why, although GPUs are significant performance boosters in many HPC domains, performance prediction is still based on extensive benchmarking, and performance bottleneck analysis remains a nonsystematic, experience-driven process. In this context, we propose a tool for bottleneck analysis and performance prediction for GPU-accelerated applications. Based on random forest modeling, and using hardware performance counters data, our method can be used to quickly and accurately evaluate application performance on GPU-based systems for different problem characteristics and different hardware generations. We illustrate the benefits of our approach with three detailed use cases: a simple step-by-step example on a parallel reduction kernel, and two classical benchmarks (matrix multiplication and sequence alignment). Our results so far indicate that our statistical modeling is a quick, easy-to-use method to grasp the performance characteristics of applications running on GPUs. Our current work focuses on tackling some of its applicability limitations (more applications, more platforms) and improving its usability (full automation from input to user feedback).

Publication
2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops : IPDPSW 2016