Table des matières
Jean Zay: CPU Slurm partitions
The partitions available
All the DARI or Dynamic Access projects having CPU hours have Slurm partitions defined on Jean Zay available to them:
- The cpu_p1 partition is automatically used if no partition is specified by all jobs requiring CPU hours. The execution time by default is 10 minutes and it cannot exceed 100 hours (
--time=HH:MM:SS
≤ 100:00:00; see below). - The prepost partition allows launching a job on one of the Jean Zay pre-/post-processing nodes,
jean-zay-pp
: These computing hours are not deducted from your allocation. The execution time by default is 2 hours and it cannot exceed 20 hours (--time=HH:MM:SS
≤ 20:00:00, see below). - The visu partition allows launching a job on one of the Jean Zay visualization nodes,
jean-zay-visu
: These computing hours are not deducted from your allocation. The execution time by default is 10 minutes and it cannot exceed 4 hours (--time=HH:MM:SS
≤ 4:00:00, see below). - The archive partition is dedicated to data management (copying or moving files, creating archive files): The computing hours are not deducted from your allocation. The execution time by default is 2 hours and it cannot exceed 20 hours (
--time=HH:MM:SS
≤ 20:00:00, see below). - The compil partition is dedicated to library and binary compilations which cannot be done on the front end because they require too much CPU time: The computing hours are not deducted from your allocation. The execution time by default is 2 hours and it cannot exceed 20 hours (
--time=HH:MM:SS
≤ 20:00:00, see below).
Important: Be careful about the time limits by default of the partitions which are intentionally low. For a long execution, you should specify a time limit for the execution which must remain inferior to the maximum time authorised for the partition and the Quality of Service (QoS) used. To specify the time limits you must use either:
- The Slurm directive
#SBATCH --time=HH:MM:SS
in your job, or - The option
--time=HH:MM:SS
of thesbatch
,salloc
orsrun
commands.
As the cpu_p1 is the default partition, you do not need to request it. The other partitions, however, must be explicitly specified to be used. For example, to specify the prepost partition, you can use either:
- The Slurm directive
#SBATCH --partition=prepost
in your job, or - The option
--partition=prepost
of thesbatch
,salloc
orsrun
commands.
Warning: Since October 11, 2019, any job requiring more than one node runs in exclusive mode: The nodes are not shared. Consequently, the use of a part of a node results in the entire node being counted. For example, the reservation of 41 cores (or 1 node + 1 core) results in the invoicing of 80 cores (or 2 nodes). On the other hand, the total memory of the reserved nodes is available (on the order of 160 usable GBs per node).
The QoS available
For each job submitted in a partition, other than the archive, compil, prepost and visu partitions, you may specify a Quality of Service (QoS). The QoS determines the time/node limits and the priority of your job.
- The default QoS for all the CPU jobs: qos_cpu-t3
- Maximum duration: 20h00 of elapsed time
- 10240 physical cores (256 nodes) maximum per job
- 20480 physical cores (512 nodes) maximum per user (all projects combined)
- 20480 physical cores (512 nodes) maximum per project (all users combined)
- A QoS for longer executions and which must be specified to be used (see below): qos_cpu-t4
- Maximum duration: 100h00 of elapsed time
- 160 physical cores (4 nodes) maximum per job
- 640 physical cores (16 nodes) maximum per user (all projects combined)
- 640 physical cores (16 nodes) maximum per project (all users combined)
- 5120 physical cores (128 nodes) maximum for the totality of jobs requesting this QoS
- A QoS reserved only for short executions carried out within the frameworks of code development or execution tests and which must be specified to be used (see below): qos_cpu-dev
- A maximum of 10 jobs (running or pending) simultaneously per user,
- Maximum duration: 2h00 of elapsed time
- 5120 physical cores (128 nodes) maximum per job
- 5120 physical cores (128 nodes) maximum per user (all projects combined)
- 5120 physical cores (128 nodes) maximum per project (all users combined)
- 10240 physical cores (256 nodes) maximum for the totality of jobs requesting this QoS
To specify a QoS which is different from the default one, you can either:
- Use the Slurm directive
#SBATCH --qos=qos_cpu-dev
(for example) in your job, or - Specify the
--qos=qos_cpu-dev
option of thesbatch
,salloc
orsrun
commands.
Summary table about CPU QoS limits | |||||
QoS | Elapsed time limit | Resource limit | |||
---|---|---|---|---|---|
per job | per user (all projects combined) | per project (all users combined) | per QoS | ||
qos_cpu-t3 (default) | 20h | 10240 physical cores (256 nodes) | 20480 physical cores (512 nodes) | 20480 physical cores (512 nodes) | |
qos_cpu-t4 | 100h | 160 physical cores (4 nodes) | 640 physical cores (16 nodes) | 640 physical cores (16 nodes) | 5120 physical cores (128 nodes) |
qos_cpu-dev | 2h | 5120 physical cores (128 nodes) | 5120 physical cores (128 nodes), 10 jobs maximum (running or pending) simultaneously | 5120 physical cores (128 nodes) | 10240 physical cores (256 nodes) |