Storage

Path

Performance

Usage

Quota (Space/Files)

Backup

Auto-cleanup

/network/datasets/

High

  • Curated raw datasets (read only)

/network/weights/

High

  • Curated models weights (read only)

$HOME or /home/mila/<u>/<username>/

Low

  • Personal user space

  • Specific libraries, code, binaries

100GB/1000K

Daily

no

$SCRATCH or /network/scratch/<u>/<username>/

High

  • Temporary job results

  • Processed datasets

  • Optimized for small Files

5TB/no

no

90 days

$SLURM_TMPDIR

Highest

  • High speed disk for temporary job results

no/no

no

at job end

/network/projects/<groupname>/

Fair

  • Shared space to facilitate collaboration between researchers

  • Long-term project storage

1TB/1000K

Daily

no

$ARCHIVE or /network/archive/<u>/<username>/

Low

  • Long-term personal storage

5TB

no

no

Note

The $HOME file system is backed up once a day. For any file restoration request, file a request to Mila’s IT support with the path to the file or directory to restore, with the required date.

$HOME

$HOME is appropriate for codes and libraries which are small and read once, as well as the experimental results that would be needed at a later time (e.g. the weights of a network referenced in a paper).

Quotas are enabled on $HOME for both disk capacity (blocks) and number of files (inodes). The limits for blocks and inodes are respectively 100GiB and 1 million per user. The command to check the quota usage from a login node is:

disk-quota

$SCRATCH

$SCRATCH can be used to store processed datasets, work in progress datasets or temporary job results. Its block size is optimized for small files which minimizes the performance hit of working on extracted datasets.

Note

Auto-cleanup: this file system is cleared on a daily basis, files not used for more than 90 days will be deleted. This period can be shortened when the file system usage is above 90%.

Quotas are enabled on $SCRATCH for disk capacity (blocks). The limit is 5TiB. There is no limit in the number of files (inodes). The command to check the quota usage from a login node is:

disk-quota

$SLURM_TMPDIR

$SLURM_TMPDIR points to the local disk of the node on which a job is running. It should be used to copy the data on the node at the beginning of the job and write intermediate checkpoints. This folder is cleared after each job.

projects

projects can be used for collaborative projects. It aims to ease the sharing of data between users working on a long-term project.

Quotas are enabled on projects for both disk capacity (blocks) and number of files (inodes). The limits for blocks and inodes are respectively 1TiB and 1 million per group.

Note

It is possible to request higher quota limits if the project requires it. File a request to Mila’s IT support.

$ARCHIVE

$ARCHIVE purpose is to store data other than datasets that has to be kept long-term (e.g. generated samples, logs, data relevant for paper submission).

$ARCHIVE is only available on the login nodes and CPU-only nodes. Because this file system is tuned for large files, it is recommended to archive your directories. For example, to archive the results of an experiment in $SCRATCH/my_experiment_results/, run the commands below from a login node:

cd $SCRATCH
tar cJf $ARCHIVE/my_experiment_results.tar.xz --xattrs my_experiment_results

Disk capacity quotas are enabled on $ARCHIVE. The soft limit per user is 5TB, the hard limit is 5.1TB. The grace time is 7 days. This means that one can use more than 5TB for 7 days before the file system enforces quota. However, it is not possible to use more than 5.1TB. The command to check the quota usage from a login node is df:

df -h $ARCHIVE

Note

There is NO backup of this file system.

datasets

datasets contains curated datasets to the benefit of the Mila community. To request the addition of a dataset or a preprocessed dataset you think could benefit the research of others, you can fill the datasets form. Datasets can also be browsed from the web : Mila Datasets

Datasets in datasets/restricted are restricted and require an explicit request to gain access. Please submit a support ticket mentioning the dataset’s access group (ex.: scannet_users), your cluster’s username and the approbation of the group owner. You can find the dataset’s access group by listing the content of /network/datasets/restricted with the ls command.

Those datasets are mirrored to the Alliance clusters in ~/projects/rrg-bengioy-ad/data/curated/ if they follow Digital Research Alliance of Canada’s good practices on data. To list the local datasets on an Alliance cluster, you can execute the following command:

ssh [CLUSTER_LOGIN] -C "projects/rrg-bengioy-ad/data/curated/list_datasets_cc.sh"

weights

weights contains curated models weights to the benefit of the Mila community. To request the addition of a weight you think could benefit the research of others, you can fill the weights form.

Weights in weights/restricted are restricted and require an explicit request to gain access. Please submit a support ticket mentioning the weights’s access group (ex.: NAME_OF_A_RESTRICTED_MODEL_WEIGHTS_users), your cluster’s username and the approbation of the group owner. You can find the weights’s access group by listing the content of /network/weights/restricted with the ls command.