Storage
Path |
Performance |
Usage |
Quota (Space/Files) |
Backup |
Auto-cleanup |
|---|---|---|---|---|---|
|
High |
|
|||
|
High |
|
|||
|
Low |
|
100GB/1000K |
Daily |
no |
|
High |
|
5TB/no |
no |
90 days |
|
Highest |
|
no/no |
no |
at job end |
|
Fair |
|
1TB/1000K |
Daily |
no |
|
Low |
|
5TB |
no |
no |
Note
The $HOME file system is backed up once a day. For any file
restoration request, file a request to Mila’s IT support with the path to the file or directory to
restore, with the required date.
$HOME
$HOME is appropriate for codes and libraries which are small and read once,
as well as the experimental results that would be needed at a later time (e.g.
the weights of a network referenced in a paper).
Quotas are enabled on $HOME for both disk capacity (blocks) and number of
files (inodes). The limits for blocks and inodes are respectively 100GiB and 1
million per user. The command to check the quota usage from a login node is:
disk-quota
$SCRATCH
$SCRATCH can be used to store processed datasets, work in progress datasets
or temporary job results. Its block size is optimized for small files which
minimizes the performance hit of working on extracted datasets.
Note
Auto-cleanup: this file system is cleared on a daily basis, files not used for more than 90 days will be deleted. This period can be shortened when the file system usage is above 90%.
Quotas are enabled on $SCRATCH for disk capacity (blocks). The limit is
5TiB. There is no limit in the number of files (inodes). The command to check
the quota usage from a login node is:
disk-quota
$SLURM_TMPDIR
$SLURM_TMPDIR points to the local disk of the node on which a job is
running. It should be used to copy the data on the node at the beginning of the
job and write intermediate checkpoints. This folder is cleared after each job.
projects
projects can be used for collaborative projects. It aims to ease the
sharing of data between users working on a long-term project.
Quotas are enabled on projects for both disk capacity (blocks) and number
of files (inodes). The limits for blocks and inodes are respectively 1TiB and
1 million per group.
Note
It is possible to request higher quota limits if the project requires it. File a request to Mila’s IT support.
$ARCHIVE
$ARCHIVE purpose is to store data other than datasets that has to be kept
long-term (e.g. generated samples, logs, data relevant for paper submission).
$ARCHIVE is only available on the login nodes and CPU-only nodes.
Because this file system is tuned for large files, it is recommended to archive
your directories. For example, to archive the results of an experiment in
$SCRATCH/my_experiment_results/, run the commands below from a login node:
cd $SCRATCH
tar cJf $ARCHIVE/my_experiment_results.tar.xz --xattrs my_experiment_results
Disk capacity quotas are enabled on $ARCHIVE. The soft limit per user is
5TB, the hard limit is 5.1TB. The grace time is 7 days. This means that one can
use more than 5TB for 7 days before the file system enforces quota. However, it
is not possible to use more than 5.1TB.
The command to check the quota usage from a login node is df:
df -h $ARCHIVE
Note
There is NO backup of this file system.
datasets
datasets contains curated datasets to the benefit of the Mila community. To
request the addition of a dataset or a preprocessed dataset you think could
benefit the research of others, you can fill the datasets form. Datasets can also be browsed from the
web : Mila Datasets
Datasets in datasets/restricted are restricted and require an explicit
request to gain access. Please submit a support ticket mentioning the
dataset’s access group (ex.: scannet_users), your cluster’s username and the
approbation of the group owner. You can find the dataset’s access group by
listing the content of /network/datasets/restricted with the ls command.
Those datasets are mirrored to the Alliance clusters in
~/projects/rrg-bengioy-ad/data/curated/ if they follow Digital Research
Alliance of Canada’s good practices on data.
To list the local datasets on an Alliance cluster, you can execute the following
command:
ssh [CLUSTER_LOGIN] -C "projects/rrg-bengioy-ad/data/curated/list_datasets_cc.sh"
weights
weights contains curated models weights to the benefit of the Mila
community. To request the addition of a weight you think could benefit the
research of others, you can fill the weights form.
Weights in weights/restricted are restricted and require an explicit request
to gain access. Please submit a support ticket mentioning the
weights’s access group (ex.: NAME_OF_A_RESTRICTED_MODEL_WEIGHTS_users), your
cluster’s username and the approbation of the group owner. You can find the
weights’s access group by listing the content of /network/weights/restricted
with the ls command.