
Table des matières
Jean Zay: Good practices for code profiling
The purpose of this page is to list good practices during code profiling with the goal of performance analysis. Some advice is general and will apply no matter which machine is used. Other advice is specific to the Jean Zay machine.
Choice of a test case
The bottlenecks of a code can vary largely depending on the configuration and size of the test case used or the number of MPI processes chosen for the execution.
- Choose a test case which is representative of the jobs you execute (or plan to execute) on Jean Zay in the framework of your project.
- If you have configurations which vary greatly, it could be interesting to profil several different configurations to have a better idea of the impact of the test case on the performances.
Compilation
In order to function, the majority of code profiling tools require that the debugging symbols be present in the executable being analyzed. In certain cases, profiling will be possible without the debugging symbols but the results obtained will be difficult to exploit (absence of function names, line numbers, etc.)
- Use the same compilers and compilation options as your production jobs.
- Activate the debugging mode at the compilation (generally with the option
-g
) but be sure to explicitly indicate the desired optimization level (for example,-g -O3
) because many compilers deactivate the optimizations by default when the debugging mode is activated. NVIDIA compilers support the-gopt
option which allows activation of the debugging mode without changing the optimization level.
Execution
- If you use less than a whole node, it could be advantageous to use the exclusive mode (Slurm option
--exclusive
) for your job in order to be sure you don't have effects linked to potential node sharing. - For security reasons, access to certain hardware performance counters is restricted by default. When your job is exclusive, you can obtain more complete code profiles by indicating the Slurm constraint
prof
(Slurm option-C prof
) which allows having complete access to performance counters (kernel parameterperf_event_paranoid=-1
).