
Jean Zay: Execution of an MPI parallel code in batch
Jobs are managed on all of the nodes by the Slurm software.
To submit an MPI job in batch on Jean Zay, you must:
- Create a submission script. Here is an example saved in the
intel_mpi.slurm
file:- intel_mpi.slurm
#!/bin/bash #SBATCH --job-name=TravailMPI # name of job #SBATCH --ntasks=80 # total number of MPI processes #SBATCH --ntasks-per-node=40 # number of MPI processes per node # /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading. #SBATCH --hint=nomultithread # 1 MPI process per physical core (no hyperthreading) #SBATCH --time=00:10:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=TravailMPI%j.out # name of output file #SBATCH --error=TravailMPI%j.out # name of error file (here, in common with output) # go into the submission directory cd ${SLURM_SUBMIT_DIR} # clean out the modules loaded in interactive and inherited by default module purge # loading modules module load intel-all/19.0.4 # echo of launched commands set -x # code execution srun ./exec_mpi
- Submit this script via the
sbatch
command:$ sbatch intel_mpi.slurm
Caution: The current configuration of the machine does not allow using hyperthreading (execution of 80 MPI processes on the 40 physical cores of a compute node) with a purely MPI code.
Comments:
- We recommend that you compile and execute your codes under the same Intel environment: Use exactly the same command
module load intel/…
at the execution and at the compilation. - The
module purge
is made necessary by the Slurm default behaviour: Any modules which are loaded in your environment at the moment when you launchsbatch
will be passed to the submitted job. - In this example, we assume that the
exec_mpi
executable file is found in the submission directory which is the directory in which we enter thesbatch
command: TheSLURM_SUBMIT_DIR
variable is automatically recovered by Slurm. - The computation output file
TravailMPI_numero_job.out
is also found in the submission directory. It is created at the start of the job execution: Editing or modifying it while the job is running can disrupt the execution. - To avoid errors from the automatic task distribution, we recommend that you use
srun
to execute your code instead ofmpirun
. This guarantees a distribution which conforms to the specifications of the requested resources in your submission file. - All jobs have resources defined in Slurm per partition and per QoS (Quality of Service) by default. You can modify the limits by specifying another partition and / or QoS as shown in our documentation detailing the partitions and QoS.
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify the project accounting (hours allocation of the project) for which to count the computing hours of the job as indicated in our documentation detailing the computing hours accounting.
- We strongly recommend that you consult our documentation detailing the computing hours accounting to ensure that the hours consumed by your jobs are deducted from the correct accounting.