Measuring Performance in Gyselalib++
Gyselalib++ is built on DDC which is in turn built on Kokkos. Kokkos provides integrated tooling for performance measurement which makes it simple to profile applications build with Gyselalib++.
General Profiling with Kokkos Tools
Kokkos Tools must be compiled before they can be used. If you have a system installation or an instance of Kokkos Tools elsewhere on you system then this can be used. Otherwise the Kokkos-tools repository can be found as a submodule of Gyselalib++ in vendor/kokkos-tools/
.
To compile all Kokkos Tools libraries simply run:
cd $GYSELALIB_ROOT/vendor/kokkos-tools
cmake -B build .
cmake --build build
Kokkos Tools provide hooks to measure and annotate performance data. You can enable a tool by setting the KOKKOS_PROFILE_LIBRARY
environment variable to point to the shared library implementing the tool.
For example:
export KOKKOS_TOOLS_LIBS=$GYSELALIB_ROOT/vendor/kokkos-tools/build/profiling/simple-kernel-timer/libkp_kernel_timer.so
Gyselalib++ will then automatically invoke the tool for all Kokkos kernels.
There are several tools available which are documented by Kokkos Tools. The tools that are particularly useful are:
SimpleKernelTimer
for lightweight kernel-level timing.NVTXConnector
for profiling with NVIDIA Nsight Systems.VTuneConnector
for profiling with Intel VTune.
Details of how to use these tools can be found in the documentation of Kokkos Tools but we describe how to use SimpleKernelTimer
which can be used on any system.
Using SimpleKernelTimer
SimpleKernelTimer
records execution time for each Kokkos kernel launch, providing a quick overview of where runtime is spent.
The results are saved to a .dat
file. The name of this file is printed to the terminal.
The .dat
file cannot be read as is but executables in Kokkos Tools allow the file to be read. The possible executables are:
kp_reader
to output the data in human-readable format.kp_json_writer
to output the data in json format.
Example usage:
mkdir wk
cd wk
$GYSELALIB_BUILD/simulations/geometryXVx/landau/landau_fft --dump-config config_file.yml
$GYSELALIB_BUILD/simulations/geometryXVx/landau/landau_fft config_file.yml
$KOKKOS_TOOLS_BUILD_FOLDER/profiling/simple-kernel-timer/kp_json_writer my_data_file.dat > landau_fft.json
$KOKKOS_TOOLS_BUILD_FOLDER/profiling/simple-kernel-timer/kp_reader my_data_file.dat > landau_fft.txt
The timings in the output come from 2 sources:
- Regions : Added manually.
- Kernels : Generated by calls to Kokkos parallel methods.
Kokkos Profiling Regions
The code already contains some Kokkos regions for timing the code, however more can be added manually to investigate the performance more precisely.
In order to start a region the following line should be used:
Kokkos::Profiling::pushRegion("MyProfilingRegion");
where MyProfilingRegion
is a string identifying the region. To exit the region the following line should be used:
Kokkos::Profiling::popRegion();