
Table des matières
Jean Zay: Usage of CUDA MPS
Introduction
The Multi-Process Service MPS is an implementation variant compatible with the CUDA programming interface. The MPS execution architecture is designed to let co-operative multi-process CUDA applications, generally for MPI jobs, use Hyper-Q functionalities on the very latest NVIDIA GPUs. Hyper-Q allows CUDA kernels to be processed simultaneously on the same GPU; this can improve performance when the GPU calculation capacity is underused by a single application process.
Usage
CUDA MPS is included by default in the different CUDA modules available to the users.
For a multi-GPU MPI batch job, the usage of CUDA MPS can be activated with the -C mps
option. However, the node must be exclusively reserved via the --exclusive
option.
- For an execution via the default gpu partition (nodes with 40 physical cores and 4 GPUs) using only one node:
- mps_multi_gpu_mpi.slurm
#!/bin/bash #SBATCH --job-name=gpu_cuda_mps_multi_mpi # name of job #SBATCH --ntasks=40 # total number of MPI tasks #SBATCH --ntasks-per-node=40 # number of MPI tasks per node (all physical cores) #SBATCH --gres=gpu:4 # number of GPUs per node (all GPUs) #SBATCH --cpus-per-task=1 # number of cores per task # /!\ Caution: In Slurm vocabulary, "multithread" refers to hyperthreading. #SBATCH --hint=nomultithread # hyperthreading deactivated #SBATCH --time=00:10:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=gpu_cuda_mps_multi_mpi%j.out # name of output file #SBATCH --error=gpu_cuda_mps_multi_mpi%j.out # name of error file (here, common with the output) #SBATCH --exclusive # exclusively reserves the node #SBATCH -C mps # the MPS is activated # cleans out modules loaded in interactive and inherited by default module purge # loads modules module load ... # echo of launched commands set -x # execution of the code with binding via bind_gpu.sh: 4 GPUs for 40 MPI tasks. srun ./executable_multi_gpu_mpi
Submit script via the sbatch
command:
$ sbatch mps_multi_gpu_mpi.slurm
Comments:
- Similarly, you can execute your job on an entire node of the
gpu_p2
partition (nodes with 24 physical cores and 8 GPUs) by specifying:#SBATCH --partition=gpu_p2 # GPU partition requested #SBATCH --ntasks=24 # total number of MPI tasks #SBATCH --ntasks-per-node=24 # number of MPI tasks per node (all physical cores) #SBATCH --gres=gpu:8 # number of GPUs per node (all GPUs) #SBATCH --cpus-per-task=1 # number of cores per task
- Be careful, even if you use only part of the node, it has to be reserved in exclusive mode. In particular, this means that the entire node is invoiced.
- We recommend that you compile and execute your codes in the same environment by loading the same modules.
- In this example, we assume that the
executable_mps_multi_gpu_mpi
executable file is found in the submission directory, i.e. the directory in which thesbatch
command is entered. - The calculation output file,
gpu_cuda_mps_multi_mpi<numero_job>.out
, is also found in the submission directory. It is created at the start of the job execution: Editing or modifying it while the job is running can disrupt the execution. - The
module purge
is made necessary by the Slurm default behaviour: Any modules which are loaded in your environment at the moment when you launchsbatch
will be passed to the submitted job making the execution of your job dependent on what you have done previously. - To avoid errors in the automatic task distribution, we recommend using
srun
to execute your code instead ofmpirun
. This guarantees a distribution which conforms to the specifications of the resources you requested in the submission file. - Jobs have resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits or specify another partition and / or QoS as shown in our documentation detailing the partitions and QoS.
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify the project accounting (hours allocation of the project) on which to count the computing hours of the job as indicated in our documentation detailing the computing hours accounting.
- We strongly recommend that you consult our documentation detailing the computing hours accounting to ensure that the hours consumed by your jobs are deducted from the correct accounting.
Documentation
Official documentation from NIVIDIA