Table des matières
Jean Zay: The disk spaces
For each project, four distinct disk spaces are accessible: HOME, WORK, SCRATCH/JOBSCRATCH and the STORE.
Each space has specific characteristics adapted to its usage which are described below. The paths to access these spaces are stocked in five variables of the shell environment: $HOME
, $WORK
, $SCRATCH
, $JOBSCRATCH
and $STORE
.
You can know the occupation of the different disk spaces by using the IDRIS “idr_quota_user/idr_quota_project” commands or with the Unix du
(disk usage) command. The return of the idr_quota_user
and idr_quota_project
commands is immediate but is not real time information (the data are updated once a day). The du
command returns information in real time but its execution can take a long time, depending on the size of the concerned directory.
For database management specific to Jean Zay, a dedicated page was written as a complement to this one: Database management.
The HOME
$HOME
: This is the home directory during an interactive connection. This space is intended for small-sized files which are frequently used such as the shell environment files, tools, and potentially the sources and libraries if they have a reasonable size. The size of this space is limited (in space and in number of files).
The HOME characteristics are:
- A permanent space.
- Intended to receive small-sized files.
- For a multi-project login, the HOME is unique.
- Submitted to quotas per user which are intentionally rather low (3 GiB by default).
- Accessible in interactive or in a batch job via the
$HOME
variable :$ cd $HOME
- It is the home directory during an interactive connection.
Note: The HOME space is also referenced via the CCFRHOME environment variable to respect a common nomenclature with the other national computing centers (CINES, TGCC).
$ cd $CCFRHOME
The WORK
$WORK
: This is a permanent work and storage space which is usable in batch. In this space, we generally store large-sized files for use during batch executions: very large source files, libraries, data files, executable files, result files and submission scripts.
The characteristics of WORK are:
- A permanent space.
- Intended to receive large-sized files: maximum size is 10 TiB per file.
- In the case of a multi-project login, a WORK is created for each project.
- Submitted to quotas per project.
- Accessible in interactive or in a batch job.
- It is composed of 2 sections:
- A section in which each user has their own part, accessed by the command:
$ cd $WORK
- A section common to the project to which the user belongs and into which files can be placed to be shared, accessed by the command:
$ cd $ALL_CCFRWORK
- The
WORK
is a disk space with a bandwidth of about 100 GB/s in read and in write. This bandwidth can be temporarily saturated if there is exceptionally intensive usage.
Note: The WORK space is also referenced via the CCFRWORK environment variable in order to respect a common nomenclature with other national computing centers (CINES, TGCC):
$ cd $CCFRWORK
Usage recommendations
- Batch jobs can run in the WORK. However, because several of your jobs can be run at the same time, you must manage the unique identities of your execution directories or your file names.
- In addition, this disk space is submitted to quotas (per project) which can stop your execution suddenly if the quotas are reached. Therefore, you must not only be aware of your own activity in the WORK but that of your project colleagues as well. For these reasons, you may prefer using the SCRATCH or the JOBSCRATCH for the execution of batch jobs.
The SCRATCH/JOBSCRATCH
$SCRATCH
: This is a semi-permanent work and storage space which is usable in batch; the lifespan of the files is limited to 30 days. The large-sized files used during batch executions are generally stored here: the data files, result files or the computation restarts. Once the post-processing has been done to reduce the data volume, you must remember to copy the significant files into the WORK so that they are not lost after 30 days, or into the STORE for long-term archiving.
The characteristics of the SCRATCH are:
- The SCRATCH is a semi-permanent space with a 30-day file lifespan.
- Not backed up.
- Intended to receive large-sized files: maximum size is 10 TiB per file.
- Submitted to very large security quotas:
- Disk quotas per project of about 1/10th of the total disk space for each group
- Inode quotas per project on the order of 150 million files and directories.
- Accessible in interactive or in a batch job.
- Composed of 2 sections:
- A section in which each user has their own part, accessed by the command:
$ cd $SCRATCH
- A section common to the project to which the user belongs into which files can be placed to be shared. It is accessed by the command:
$ cd $ALL_CCFRSCRATCH
- In the case of a multi-project login, a SCRATCH is created for each project.
- The SCRATCH is a disk space with a bandwidth of about 500 GB/s in write and in read.
Note: The SCRATCH space is also referenced via the CCFRSCRATCH environment variable in order to respect a common nomenclature with other national computing centers (CINES, TGCC):
$ cd $CCFRSCRATCH
$JOBSCRATCH
: This is the temporary execution directory specific to batch jobs.
Its characteristics are:
- A temporary directory with file lifespan equivalent to the batch job lifespan.
- Not backed up.
- Intended to receive large-sized files: maximum size is 10 TiB per file.
- Submitted to very large security quotas:
- disk quotas per projectof about 1/10th of the total disk space for each group
- Inode quotas per project on the order of 150 million files and directories.
- Created automatically when a batch job starts and, therefore, is unique to each job.
- Destroyed automatically at the end of the job. Therefore, it is necessary to manually copy the important files onto another disk space (the WORK or the SCRATCH) before the end of the job.
- The JOBSCRATCH is a disk space with a bandwidth of about 500 GB/s in write and in read.
- During the execution of a batch job, the corresponding JOBSCRATCH is accessible from the Jean Zay front end via its JOBID job number (see the output of the squeue command), your logname (environment variable LOGNAME) and the following command:
$ cd /lustre/fsn1/jobscratch_hpe/$LOGNAME_JOBID
Usage recommendations:
- The JOBSCRATCH can be seen as the former TMPDIR.
- The SCRATCH can be seen as a semi-permanent WORK which offers the maximum input/output performance available at IDRIS but limited by a 30-day lifespan for files.
- The semi-permanent characteristics of the SCRATCH allow storing large volumes of data there between two or more jobs which run successively within a limited period of a few weeks: This disk space is not purged after each job.
The STORE
$STORE
: This is the IDRIS archiving space for long-term storage. Very large files are generally stored here after post-processing by regrouping the calculation result files in a tar
file. This is a space which is not intended to be accessed or modified on a daily basis but to preserve very large volumes of data over time with only occasional consultation.
Important change: Since 22 July 2024, the STORE is only accessible from the login nodes and the prepost, archive, compil and visu partitions. Jobs running on the compute nodes will not be able to access this space directly but you may use multi-steps jobs to automate the data transfers to/from the STORE (see our examples of multi-steps jobs using the STORE).
Its characteristics are:
- The STORE is a permanent space.
- Secured by a double copy of files that have not been modified for a few days.
- The STORE is not accessible from the compute nodes but only from the login nodes and the prepost, archive, compil and visu partitions (you may use multi-step jobs to automate the data transfers to/from the STORE; see our examples of multi-steps jobs using the STORE).
- Intended to receive very large-sized files: The maximum size is 10 TiB per file and the minimum recommended size is 250 MiB (ratio disc size/ number of inodes).
- In the case of a multi-project login, a STORE is created per project.
- Submitted to quotas per project with a small number of inodes, but a very large space.
- Composed of 2 sections:
- A section in which each user has their own part, accessed by the command:
$ cd $STORE
- A section common to the project to which the user belongs and into which files can be placed to be shared. It is accessed by the command:
$ cd $ALL_CCFRSTORE
Note: The STORE space is also referenced via the CCFRSTORE environment variable in order to respect a common nomenclature with other national computing centers (CINES, TGCC):
$ cd $CCFRSTORE
Usage recommendations:
- However, there is no longer a limitation on file lifespan.
- As this is an archive space, it is not intended for frequent access.
The DSDIR
$DSDIR
: This storage space is dedicated to voluminous public data bases (in size or number of files) which are needed for using AI tools. These databases are visible to all Jean Zay users.
If you use large public databases which are not found in the $DSDIR
space, IDRIS will download and install them in this disk space at your request.
The list of currently available databases is found on this page: Jean Zay: Datasets and models available in the $DSDIR storage space.
If your database is personal or under a license which is too restrictive, you must take charge of its management yourself in the disk spaces of your project, as described on the Database Management page.
Summary table of the main disk spaces
Space | Default capacity | Features | Usage |
---|---|---|---|
$HOME |
3GB and 150k inodes per user |
- Home directory at connection |
- Storage of configuration files and small files |
$WORK |
5TB (*) and 500k inodes per project |
- Storage on rotating disks (350GB/s in read et 300GB/s in write) |
- Storage of sources and input/output data - Execution in batch or interactive |
$SCRATCH |
Very large security quotas, 4.6PB shared by all users | - SSD Storage (1,5TB/s in read et 1,1TB/s in write) - Lifespan of unused files: 30 days (unused = not read or modified) - Space not backed up |
- Storage of voluminous input/output data - Execution in batch or interactive - Optimal performance for read/write operations |
$STORE |
50TB (*) and 100k inodes (*) per project |
- Disk cache and magnetic tapes - Long accesses if file only on tape. - Secured by a double copy on magnetic tapes of files that have not been modified for a few days |
- Long-term archive storage (for lifespan of project) - No access from compute nodes |
The snapshots
WARNING: Due to the migration to new Lustre parallel file systems, there is no backup of the WORK space anymore. We recommend that you keep a backup of your important files as archive files stored in your STORE disk space.