Table des matières
Jean Zay: Collection of similar jobs (Job Array)
Introduction
The SLURM Job Array mechanism offers the user the possibility of submitting a collection of similar jobs at one time.
Important: At IDRIS, job arrays must be submitted via the sbatch
command as they can generate a very large number of jobs.
Job arrays are differentiated by the usage of the SLURM #SBATCH --array
directive which allows specifying the array index values, or a range of index values, as indicated below:
- For a job array whose index values go successively from 0 to NB_JOBS (i.e. 0, 1, 2, … , NB_JOBS):
#SBATCH --array=0-NB_JOBS
- For a job array whose index values vary from 0 to NB_JOBS (maximum potential value) with STEP_SIZE (i.e. 0, STEP_SIZE, 2*STEP_SIZE, etc…):
#SBATCH --array=0-NB_JOBS:STEP_SIZE
- For an array of N jobs having the predefined index values of J_1, J_2, … , J_N :
#SBATCH --array=J_1,J_2,...,J_N
Comments:
- The maximum number of jobs allowed in the Slurm queue is 10,000 per user.
- The maximum number of jobs that can be run simultaneously from a job array can be specified by using the
%
separator. The following command allows you to execute all the jobs of an array by series ofNB_MAX_RUNNING_JOBS
:#SBATCH --array=0-NB_JOBS%NB_MAX_RUNNING_JOBS
Variables specific to Job Arrays
When using the Job Array mechanism, certain SLURM environment variables can be used in the script shell in order to personalize the different jobs in an array (for example, for each job of an array to use different input and/or output directories). The following environment variables are automatically set by SLURM:
SLURM_JOB_ID
: the job identifierSLURM_ARRAY_JOB_ID
: the first job identifier of the arraySLURM_ARRAY_TASK_ID
: the index value belonging to each job of the array (can be seen as a job counter)SLURM_ARRAY_TASK_COUNT
: the total number of jobs in the array which will be executedSLURM_ARRAY_TASK_MIN
: the lowest index value of all the jobs in the arraySLURM_ARRAY_TASK_MAX
: the highest index value of all the jobs in the array
Moreover, with job arrays, in the #SBATCH --output=...
and #SBATCH --error=...
directives, two additional options are available to specify the names of input and output files for each job:
%A
will be replaced by the value ofSLURM_ARRAY_JOB_ID
%a
will be replaced by the value ofSLURM_ARRAY_TASK_ID
.
Comments:
- By default, the format of the output file name for a job array is
slurm-%A_%a.out
.
- In Bash, job array variables can be obtained as follows:
echo ${SLURM_JOB_ID} echo ${SLURM_ARRAY_JOB_ID} echo ${SLURM_ARRAY_TASK_ID} echo ${SLURM_ARRAY_TASK_COUNT} echo ${SLURM_ARRAY_TASK_MIN} echo ${SLURM_ARRAY_TASK_MAX}
- For Python scripts, job array variables can be obtained as follows:
import os slurm_job_id=int(os.environ["SLURM_JOB_ID"]) slurm_array_job_id=int(os.environ["SLURM_ARRAY_JOB_ID"]) slurm_array_task_id=int(os.environ["SLURM_ARRAY_TASK_ID"]) slurm_array_task_count=int(os.environ["SLURM_ARRAY_TASK_COUNT"]) slurm_array_task_min=int(os.environ["SLURM_ARRAY_TASK_MIN"]) slurm_array_task_max=int(os.environ["SLURM_ARRAY_TASK_MAX"])
Examples of usage
- Preliminary remark: the examples below concern executions on the CPU partition. The principle remains the same for executions on the GPU partitions.
- Example of a submission script for 20 identical jobs with a maximum of 5 jobs placed in the file (execution by series of 5 jobs):
- job_array_20.slurm
#!/bin/bash #SBATCH --job-name=job-array # name of job #SBATCH --ntasks=1 # total number of MPI processes #SBATCH --ntasks-per-node=1 # number of MPI processes per node # In Slurm vocabulary, "multithread" refers to hyperthreading. #SBATCH --hint=nomultithread # 1 MPI process per physical core (no hyperthreading) #SBATCH --time=00:01:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=%x_%A_%a.out # output file name contaning the ID and the index value #SBATCH --error=%x_%A_%a.out # name of error file (here common with the output) #SBATCH --array=0-19%5 # total of 20 jobs but maximum of 5 jobs in the file # go into the submission directory cd ${SLURM_SUBMIT_DIR} # clean out modules loaded in interactive and inherited by default module purge # loas modules module load ... # echo of launched commands set -x # Execution of the "mon_exe" binary with different data for each job # The value of ${SLURM_ARRAY_TASK_ID} is different for each job. srun ./mon_exe < file${SLURM_ARRAY_TASK_ID}.in > file${SLURM_ARRAY_TASK_ID}.out
- Example of a submission script for 3 identical jobs, having the index values of 1, 3 and 8 respectively :
- job_array_3.slurm
#!/bin/bash #SBATCH --job-name=job-array # name of job #SBATCH --ntasks=1 # total number of MPI processes #SBATCH --ntasks-per-node=1 # number of MPI processes per node # In Slurm vocabulary, "multithread" refers to hyperthreading. #SBATCH --hint=nomultithread # 1 MPI process per physical core (no hyperthreading) #SBATCH --time=00:01:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=%x_%A_%a.out # name of output file containing the ID and the index value #SBATCH --error=%x_%A_%a.out # name of error file (here common with the output) #SBATCH --array=1,3,8 # total of 3 jobs having the index values 1, 3 and 8 # go into the submission directory cd ${SLURM_SUBMIT_DIR} # clean out modules loaded in interactive and inherited by default module purge # load modules module load ... # echo of launched commands set -x # Execution of the "mon_exe" binary with different data for each job # The value of ${SLURM_ARRAY_TASK_ID} is different for each job. srun ./mon_exe < file${SLURM_ARRAY_TASK_ID}.in > file${SLURM_ARRAY_TASK_ID}.out
- Example of a submission script for 6 identical jobs, having the index values 0 to 11 inclusive with step of 2 :
- job_array_0-11.slurm
#!/bin/bash #SBATCH --job-name=job-array # name of job #SBATCH --ntasks=1 # total number of MPI processes #SBATCH --ntasks-per-node=1 # number of MPI processes per node # In Slurm vocabulary, "multithread" refers to hyperthreading. #SBATCH --hint=nomultithread # 1 MPI process per physical core (no hyperthreading) #SBATCH --time=00:01:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=%x_%A_%a.out # name of output file containing the ID and the index value #SBATCH --error=%x_%A_%a.out # name of error file (ihere common with the output) #SBATCH --array=0-11:2 # 6 jobs having the index values 0, 2, 4, 6, 8, and 10 # go into the submission directory cd ${SLURM_SUBMIT_DIR} # clean out modules loaded in interactive and inherited by default module purge # load modules module load ... # echo of launched commands set -x # Execution of the "mon_exe" binary with different data for each job # The value of ${SLURM_ARRAY_TASK_ID} is different for each job. srun ./mon_exe < file${SLURM_ARRAY_TASK_ID}.in > file${SLURM_ARRAY_TASK_ID}.out
Job Array commands
A job array must be executed via the sbatch
command by reason of the large number of jobs it can generate:
$ sbatch job_array.slurm
Monitoring these jobs is done with the squeue
command which returns the useful information. For example, for a job array with 7 jobs, executed by series of 2 jobs on the cpu_p1
partition:
- The first call to
squeue
returns :$ squeue -J 305813 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 305813_[2-6%2] cpu_p1 job-array mylogin PD 0:00 1 (JobArrayTaskLimit) 305813_0 cpu_p1 job-array mylogin R 0:00 1 r7i1n0 305813_1 cpu_p1 job-array mylogin R 0:00 1 r8i6n3
Here, we see that the first 2 jobs are executing and the other 5 are waiting.
- When the first 2 jobs are finished, a second call to
squeue
returns:$ squeue -J 305813 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 305813_[4-6%2] cpu_p1 job-array mylogin PD 0:00 1 (JobArrayTaskLimit) 305813_2 cpu_p1 job-array mylogin R 0:05 1 r7i1n0 305813_3 cpu_p1 job-array mylogin R 0:05 1 r8i6n3
Now we can see that the next 2 jobs are executing and there are only 3 which are still waiting. Note that we can no longer see the first 2 jobs which have finished.
To delete a job array, you should use the scancel
command. However, there are different ways to proceed:
- To cancel the entire array, indicate its identifier
${SLURM_JOB_ID}
. With the above example, this is:$ scancel 305813
- To cancel the execution of a particular job, indicate the array identifier
${SLURM_ARRAY_JOB_ID}
and the index value of the job${SLURM_ARRAY_TASK_ID}
. With the above example, this is:$ scancel 305813_2
- To cancel the execution of a series of jobs, indicate the array identifier
${SLURM_ARRAY_JOB_ID}
and a range of values (here, 4 to 6). With the above example, this is:$ scancel 305813_[4-6]