SAC on a Niagara T3-4 server: lessons and experiences

Abstract

The Sparc T3-4 server provides up to 512 concurrent hardware threads, a degree of concurrency that is unprecedented in a single server system. This paper reports on how the automatically parallelising compiler of the data-parallel functional array language SAC copes with up to 512 execution units. We investigate three different numerical kernels that are representative for a wide range of applications: matrix multiplication, convolution and 3-dimensional FFT. We show both the high-level declarative coding style of SAC and the performance achieved on the T3-4 server. Last not least, we draw conclusions for improving our compiler technology in the future.

Publication
Applications, tools and techniques on the road to exascale computing