Jean Zay: Execution of a hybrid MPI/OpenMP job in batch
Batch jobs are managed on all the nodes by the software Slurm .
To submit a hybrid MPI + OpenMP batch job on Jean Zay, it is necessary to:
- Create a submission script. The following is an example saved in the
intel_mpi_omp.slurm
file:- intel_mpi_omp.slurm
#!/bin/bash #SBATCH --job-name=Hybrid # name of job #SBATCH --ntasks=8 # name of the MPI process #SBATCH --cpus-per-task=10 # number of OpenMP threads # /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading. #SBATCH --hint=nomultithread # 1 thread per physical core (no hyperthreading) #SBATCH --time=00:10:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=Hybride%j.out # name of output file #SBATCH --error=Hybride%j.out # name of error file (here, common with the output file) # go into the submission directory cd ${SLURM_SUBMIT_DIR} # clean out the modules loaded in interactive and inherited by default module purge # loading modules module load intel-all/19.0.4 # echo of launched commands set -x # number of OpenMP threads export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # OpenMP binding export OMP_PLACES=cores # code execution srun ./exec_mpi_omp
- Submit this script via the
sbatch
command:$ sbatch intel_mpi_omp.slurm
Comments:
- We recommend that you compile and execute your codes under the same Intel environment: Use exactly the same command
module load intel…
at the execution and at the compilation. - The
module purge
is made necessary by Slurm default behaviour: Any modules which are loaded in your environment at the moment when you launchsbatch
will be passed to the submitted job. - In this example, we assume that the
exec_mpi
executable file is found in the submission directory which is the directory in which we enter thesbatch
command: TheSLURM_SUBMIT_DIR
variable is automatically recovered by Slurm. - The computation output file
Hybride<numero_job>.out
is also found in the submission directory. It is created at the start of the job execution: Editing or modifying it while the job is running can disrupt the execution. - To avoid errors from the automatic task distribution, we recommend that you use
srun
to execute your code instead ofmpirun
. This guarantees a distribution which conforms to the specifications of the requested resources in your submission file. - All jobs have resources defined in Slurm per partition and per QoS (Quality of Service) by default. You can modify the limits by specifying another partition and / or QoS as shown in our documentation detailing the partitions and Qos.
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify the project accounting (hours allocation of the project) on which to count the computing hours of the job as indicated in our documentation detailing the project hours management.
- We strongly recommend that you consult our documentation detailing the project hours management to ensure that the hours consumed by your jobs are deducted from the correct accounting.