Jean Zay: CPU Slurm partitions

The partitions available

All the DARI or Dynamic Access projects having CPU hours have Slurm partitions defined on Jean Zay available to them:

  • The cpu_p1 partition is automatically used if no partition is specified by all jobs requiring CPU hours. The execution time by default is 10 minutes and it cannot exceed 100 hours (--time=HH:MM:SS ≤ 100:00:00; see below).
  • The prepost partition allows launching a job on one of the Jean Zay pre-/post-processing nodes, jean-zay-pp: These computing hours are not deducted from your allocation. The execution time by default is 2 hours and it cannot exceed 20 hours (--time=HH:MM:SS ≤ 20:00:00, see below).
  • The visu partition allows launching a job on one of the Jean Zay visualization nodes, jean-zay-visu: These computing hours are not deducted from your allocation. The execution time by default is 10 minutes and it cannot exceed 4 hours (--time=HH:MM:SS ≤ 4:00:00, see below).
  • The archive partition is dedicated to data management (copying or moving files, creating archive files): The computing hours are not deducted from your allocation. The execution time by default is 2 hours and it cannot exceed 20 hours (--time=HH:MM:SS ≤ 20:00:00, see below).
  • The compil partition is dedicated to library and binary compilations which cannot be done on the front end because they require too much CPU time: The computing hours are not deducted from your allocation. The execution time by default is 2 hours and it cannot exceed 20 hours (--time=HH:MM:SS ≤ 20:00:00, see below).

Important: Be careful about the time limits by default of the partitions which are intentionally low. For a long execution, you should specify a time limit for the execution which must remain inferior to the maximum time authorised for the partition and the Quality of Service (QoS) used. To specify the time limits you must use either:

  • The Slurm directive #SBATCH --time=HH:MM:SS in your job, or
  • The option --time=HH:MM:SS of the sbatch, salloc or srun commands.

As the cpu_p1 is the default partition, you do not need to request it. The other partitions, however, must be explicitly specified to be used. For example, to specify the prepost partition, you can use either:

  • The Slurm directive #SBATCH --partition=prepost in your job, or
  • The option --partition=prepost of the sbatch, salloc or srun commands.

Warning: Since October 11, 2019, any job requiring more than one node runs in exclusive mode: The nodes are not shared. Consequently, the use of a part of a node results in the entire node being counted. For example, the reservation of 41 cores (or 1 node + 1 core) results in the invoicing of 80 cores (or 2 nodes). On the other hand, the total memory of the reserved nodes is available (on the order of 160 usable GBs per node).

The QoS available

For each job submitted in a partition, other than the archive, compil, prepost and visu partitions, you may specify a Quality of Service (QoS). The QoS determines the time/node limits and the priority of your job.

  • The default QoS for all the CPU jobs: qos_cpu-t3
    • Maximum duration: 20h00 of elapsed time
    • 10240 physical cores (256 nodes) maximum per job
    • 20480 physical cores (512 nodes) maximum per user (all projects combined)
    • 20480 physical cores (512 nodes) maximum per project (all users combined)
  • A QoS for longer executions and which must be specified to be used (see below): qos_cpu-t4
    • Maximum duration: 100h00 of elapsed time
    • 160 physical cores (4 nodes) maximum per job
    • 640 physical cores (16 nodes) maximum per user (all projects combined)
    • 640 physical cores (16 nodes) maximum per project (all users combined)
    • 5120 physical cores (128 nodes) maximum for the totality of jobs requesting this QoS
  • A QoS reserved only for short executions carried out within the frameworks of code development or execution tests and which must be specified to be used (see below): qos_cpu-dev
    • A maximum of 10 jobs (running or pending) simultaneously per user,
    • Maximum duration: 2h00 of elapsed time
    • 5120 physical cores (128 nodes) maximum per job
    • 5120 physical cores (128 nodes) maximum per user (all projects combined)
    • 5120 physical cores (128 nodes) maximum per project (all users combined)
    • 10240 physical cores (256 nodes) maximum for the totality of jobs requesting this QoS

To specify a QoS which is different from the default one, you can either:

  • Use the Slurm directive #SBATCH --qos=qos_cpu-dev (for example) in your job, or
  • Specify the --qos=qos_cpu-dev option of the sbatch, salloc or srun commands.

Summary table about CPU QoS limits
QoS Elapsed time limit Resource limit
per job per user (all
projects combined)
per project (all
users combined)
per QoS
qos_cpu-t3 (default) 20h 10240 physical cores
(256 nodes)
20480 physical cores
(512 nodes)
20480 physical cores
(512 nodes)
qos_cpu-t4 100h 160 physical cores
(4 nodes)
640 physical cores
(16 nodes)
640 physical cores
(16 nodes)
5120 physical cores
(128 nodes)
qos_cpu-dev 2h 5120 physical cores
(128 nodes)
5120 physical cores
(128 nodes),
10 jobs maximum
(running or pending)
simultaneously
5120 physical cores
(128 nodes)
10240 physical cores
(256 nodes)