Summary Communication and computation overlapping techniques have been introduced in the five-dimensional gyrokinetic codes GYSELA GKV. In order to anticipate some of exa-scale requirements, these were ported modern accelerators, Xeon Phi KNL Tesla P 100 GPU. On a serial version on GKV GPU are respectively 1.3× 7.4× faster than those single Skylake processor (a socket). For scalability, we measured performance from 16 512 KNLs (1024 32k cores) 32 256 GPUs. their parallel versions, transpose communication semi-Lagrangian solver or Convolution kernel turned out be main bottleneck. This indicates that exa-scale, network constraints would critical. mitigate costs, pipeline task-based implemented codes. The 2D advection has achieved 33% 92% speed up, convolution factor 2 up with pipelining. approach gives 11% 82% gain derivative electrostatic potential GYSELA. We shown pipeline-based is applicable presence symmetry, while can more general situations.