Overlapping communications in gyrokinetic codes on accelerator‐based platforms

Abstract

Summary Communication and computation overlapping techniques have been introduced in the five‐dimensional gyrokinetic codes GYSELA GKV. In order to anticipate some of exa‐scale requirements, these were ported modern accelerators, Xeon Phi KNL Tesla P 100 GPU. On a serial version on GKV GPU are respectively 1.3× 7.4× faster than those single Skylake processor (a socket). For scalability, we measured performance from 16 512 KNLs (1024 32k cores) 32 256 GPUs. their parallel versions, transpose communication semi‐Lagrangian solver or Convolution kernel turned out be main bottleneck. This indicates that exa‐scale, network constraints would critical. mitigate costs, pipeline task‐based implemented codes. The 2D advection has achieved 33% 92% speed up, convolution factor 2 up with pipelining. approach gives 11% 82% gain derivative electrostatic potential GYSELA. We shown pipeline‐based is applicable presence symmetry, while can more general situations.

Publication
Concurrency and Computation Practice and Experience