Measuring Performance in Gyselalib++

Gyselalib++ is built on DDC which is in turn built on Kokkos. Kokkos provides integrated tooling for performance measurement which makes it simple to profile applications build with Gyselalib++.

General Profiling with Kokkos Tools

Kokkos Tools must be compiled before they can be used. If you have a system installation or an instance of Kokkos Tools elsewhere on you system then this can be used. Otherwise the Kokkos-tools repository can be found as a submodule of Gyselalib++ in vendor/kokkos-tools/.

To compile all Kokkos Tools libraries simply run:

cd $GYSELALIB_ROOT/vendor/kokkos-tools
cmake -B build .
cmake --build build

Kokkos Tools provide hooks to measure and annotate performance data. You can enable a tool by setting the KOKKOS_PROFILE_LIBRARY environment variable to point to the shared library implementing the tool.

For example:

export KOKKOS_TOOLS_LIBS=$GYSELALIB_ROOT/vendor/kokkos-tools/build/profiling/simple-kernel-timer/libkp_kernel_timer.so

Gyselalib++ will then automatically invoke the tool for all Kokkos kernels.

There are several tools available which are documented by Kokkos Tools. The tools that are particularly useful are:

Details of how to use these tools can be found in the documentation of Kokkos Tools but we describe how to use SimpleKernelTimer which can be used on any system.

Using SimpleKernelTimer

SimpleKernelTimer records execution time for each Kokkos kernel launch, providing a quick overview of where runtime is spent. The results are saved to a .dat file. The name of this file is printed to the terminal.

The .dat file cannot be read as is but executables in Kokkos Tools allow the file to be read. The possible executables are:

  • kp_reader to output the data in human-readable format.
  • kp_json_writer to output the data in json format.

Example usage:

mkdir wk
cd wk
$GYSELALIB_BUILD/simulations/geometryXVx/landau/landau_fft --dump-config config_file.yml
$GYSELALIB_BUILD/simulations/geometryXVx/landau/landau_fft config_file.yml
$KOKKOS_TOOLS_BUILD_FOLDER/profiling/simple-kernel-timer/kp_json_writer my_data_file.dat > landau_fft.json
$KOKKOS_TOOLS_BUILD_FOLDER/profiling/simple-kernel-timer/kp_reader my_data_file.dat > landau_fft.txt

The timings in the output come from 2 sources:

  1. Regions : Added manually.
  2. Kernels : Generated by calls to Kokkos parallel methods.

Kokkos Profiling Regions

The code already contains some Kokkos regions for timing the code, however more can be added manually to investigate the performance more precisely.

In order to start a region the following line should be used:

Kokkos::Profiling::pushRegion("MyProfilingRegion");

where MyProfilingRegion is a string identifying the region. To exit the region the following line should be used:

Kokkos::Profiling::popRegion();