Table des matières
Jean Zay: Interactive execution of a CPU code
On Jean Zay, access to interactive resources can be done in two different ways.
Connection on the front end
Access to the front end is obtained via an ssh
connection:
$ ssh login@jean-zay.idris.fr
The interactive node resources are shared between all the connected users: As a result, the interactive on the front end is reserved exclusively for compilation and script development.
Notes: On the front end, the RAM memory is limited to 5GB to be shared among all processes and the CPU time is limited to 30mn (1800s) per process to assure a better resource sharing.
All interactive executions of your codes must be done on the CPU compute nodes by using one of the two following commands:
- The
srun
command :- to obtain a terminal on a CPU compute node within which you can execute your code,
- or to directly execute your code on the CPU partition.
- Or, the
salloc
command to reserve CPU resources which would allow you to do more than one execution.
However, if the computations require a large amount of CPU resources (in number of cores, memory, or elapsed time), it is necessary to submit a batch job.
Obtaining a terminal on a CPU compute node
It is possible to open a terminal directly on a compute node on which the resources have been reserved for you (here 4 cores) by using the following command:
$ srun --pty --ntasks=1 --cpus-per-task=4 --hint=nomultithread [--other-options] bash
Comments:
- An interactive terminal is obtained with the
--pty
option. - The reservation of physical cores is assured with the
--hint=nomultithread
option (no hyperthreading). - By default, the allocated CPU memory is proportional to the number of reserved cores. For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. You can consult our documentation on this subject : Memory allocation on CPU partitions.
--other-options
contains the usual Slurm options for job configuration (--time=
, etc.): See the documentation on batch submission scripts in the index section Execution/Commands of a CPU code.- The reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits of them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
- We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.
The terminal is operational after the resources have been granted:
$ srun --pty --ntasks=1 --cpus-per-task=4 --hint=nomultithread bash srun: job 1365358 queued and waiting for resources srun: job 1365358 has been allocated resources bash-4.2$ hostname r4i3n7
You can verify if your interactive job has started by using the squeue
command. Complete information about the status of the job can be obtained by using the scontrol show job <job identifier>
command.
After the terminal is operational, you can launch your executable files in the usual way: ./your_executable_file
. For an MPI execution, you should again use srun
: srun ./your_mpi_executable_file
. Important: Hyperthreading is not usable via MPI in this configuration.
To leave the interactive mode :
bash-4.2$ exit
Caution: If you do not yourself leave the interactive mode, the maximum allocation duration (by default or specified with the --time
option) is applied and this amount of hours is then counted for the project you have specified.
Interactive execution on the CPU partition
If you don't need to open a terminal on a compute node, it is also possible to start the interactive execution of a code on the compute nodes directly from the front end by using the following command (here with 4 tasks) :
$ srun --ntasks=4 --hint=nomultithread [--other-options] ./my_executable_file
Comments:
- The
--hint=nomultithread
option reserves physical cores (no hyperthreading). - By default, the allocated CPU memory is proportional to the number of reserved cores. For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. You may consult our documentation on this subject: Memory allocation on CPU partitions.
--other-options
contains the usual Slurm options for configuring jobs (--output=
,--time=
, etc.) : See the documentation on batch job submission scripts found in the Jean Zay index section, Execution/commands of a CPU code).- Reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits in them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
- We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.
Reserving reusable resources for more than one interactive execution
Each interactive execution started as described in the preceding section is equivalent to a different job. As with all the jobs, they are susceptible to being placed in a wait queue for a certain length of time if the computing resources are not available.
If you wish to do more than one interactive execution in a row, it may be pertinent to reserve all the resources in advance so that they can be reused for the consecutive executions. You should wait until all the resources are available at one time at the moment of the reservation and not reserve for each execution separately.
Reserving resources (here for 4 tasks) is done via the following command:
$ salloc --ntasks=4 --hint=nomultithread [--other-options]
Comments:
- The
--hint=nomultithread
option reserves physical cores (no hyperthreading). - By default, the allocated CPU memory is proportional to the number of reserved cores. For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. You may consult our documentation on this subject: Memory allocation on CPU partitions.
--other-options
contains the usual Slurm options for configuring jobs (--output=
,--time=
, etc.) : See the documentation on batch job submission scripts found in the Jean Zay index section, Execution/commands of a CPU code).- Reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits in them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
- We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.
The reservation becomes usable after the resources have been granted:
salloc: Pending job allocation 1367065 salloc: job 1367065 queued and waiting for resources salloc: job 1367065 has been allocated resources salloc: Granted job allocation 1367065
You can verify that your reservation is active by using the squeue
command. Complete information about the status of the job can be obtained by using the scontrol show job <job identifier>
command.
You can then start the interactive executions by using the srun
command:
$ srun [--other-options] ./code
Comment: If you do not specify any option for the srun
command, the options for salloc
(for example, the number of tasks) will be used by default.
Important:
- After reserving resources with
salloc
, you are still connected on the front end (you can verify this with thehostname
command). It is imperative to use thesrun
command for your executions to use the reserved resources. - If you forget to cancel the reservation, the maximum allocation duration (by default or specified with the
--time
option) is applied and these hours is then counted for the project you have specified. Therefore, to cancel the reservation, it is necessary to manually enter:
$ exit exit salloc: Relinquishing job allocation 1367065