Index
Singularity is a software container system designed to facilitate portability and reproducibility of high performance computing (HPC) workflows. It performs a function similar to docker, but with HPC in mind. It is compatible with existing docker containers, and provides tools for building new containers from recipe files or ad-hoc commands.
Building a container is like creating a new environment except that containers are much more powerful since they are self-contain systems. With singularity, there are two ways to build containers.
The first one is by yourself, it's like when you got a new Linux laptop and you don't really know what you need, if you see that something is missing, you install it. Here you can get a vanilla container with Ubuntu called a sandbox, you log in and you install each packages by yourself. This procedure can take time but will allow you to understand how things work and what you need. This is recommended if you need to figure out how things will be compiled or if you want to install packages on the fly. We'll refer to this procedure as singularity sandboxes.
The second one way is more like you know what you want, so you write a list of everything you need, you sent it to singularity and it will install everything for you. Those lists are called singularity recipes.
First way: Build and use a sandbox¶
You might ask yourself; On which machine should I build a container ?
First of all, you need to choose where you'll build your container. This operation requires memory and high cpu usage.
-
(Recommended for beginner) If you need to use apt-get, you should build the container on your laptop with sudo privileges. You'll only need to install singularity on your laptop. Windows/Mac users can look
there_ and Ubuntu/Debian users can use directly:1 2 3 4 5
.. _there: https://www.sylabs.io/guides/3.0/user-guide/installation.html#install-on-windows-or-mac .. prompt:: bash $ sudo apt-get install singularity-container -
If you can't install singularity on your laptop and you don't need apt-get, you can reserve a cpu node on the mila cluster to build your container.
In this case, in order to avoid too much I/O over the network, you should define the singularity cache locally:
1 2 3 | |
- If you can't install singularity on your laptop and you want to use apt-get, you can use
singularity-hubto build your containers and read Recipe_section.
.. _singularity-hub: https://www.singularity-hub.org/
Download containers from the web¶
Hopefully, you may not need to create containers from scratch as many have been already built for the most common deep learning software.
You can find most of them on dockerhub_.
.. _dockerhub: https://hub.docker.com/
tip: (Optional) You can also pull containers from nvidia cloud see nvidia
Go on dockerhub_ and select the container you want to pull.
.. _dockerhub: https://hub.docker.com/
For example, if you want to get the latest pytorch version with gpu support (Replace runtime by devel if you need the full CUDA toolkit):
1 | |
or the latest tensorflow:
1 | |
Currently the pulled image pytorch.simg or tensorflow.simg is read only meaning that you won't be able to install anything on it.
Starting now, pytorch will be taken as example. If you use tensorflow, simply replace every pytorch occurrences by tensorflow.
How to add or install stuff in a container¶
The first step is to transform your read only container pytorch-1.0.1-cuda10.0-cudnn7-runtime.simg in a writable version that will allow you to add packages.
tip: If you want to use apt-get you have to put sudo ahead of the following commands
This command will create a writable image in the folder pytorch.
1 | |
Then you'll need the following command to log inside the container.
1 | |
Once you get into the container, you can use pip and install anything you need (Or with apt-get if you built the container with sudo).
You should install your stuff in /usr/local instead.
Creating useful directory¶
One of the benefit of containers is that you'll be able to use them across different clusters. However for each cluster the dataset and experiment folder location can be different. In order to be invariant to those locations, we will create some useful mount points inside the container:
1 2 3 | |
From now, you won't need to worry anymore when you write your code to specify where to pick up your dataset. Your dataset will always be in /dataset
independently of the cluster you are using.
Testing¶
If you have some code that you want to test before finalizing your container, you have two choices. You can either log into your container and run python code inside it with
1 | |
or you can execute your command directly with
1 | |
Creating a new image from the sandbox¶
Once everything you need is installed inside the container, you need to convert it back to a read-only singularity image with:
1 | |
Second way: Use recipes¶
A singularity recipe is a file including specifics about installation software, environment variables, files to add, and container metadata. It is a starting point for designing any custom container. Instead of pulling a container and install your packages manually, you can specify in this file the packages you want and then build your container from this file.
Here is a toy example of a singularity recipe installing some stuff:
```bash
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
A recipe file contains two parts: the header and sections. In the header you specify which base system you want to
use, it can be any docker or singularity container. In sections, you can list the things you want to install in the subsection
post or list the environment's variable you need to source at each runtime in the subsection environment. For a more detailed
description, please look at the singularity documentation_.
.. _singularity documentation: https://www.sylabs.io/guides/2.6/user-guide/container_recipes.html#container-recipes
In order to build a singularity container from a singularity recipe file, you should use:
1 | |
Build recipe on singularity hub¶
Singularity hub allows users to build containers from recipes directly on singularity-hub's cloud meaning that you don't need anymore to build containers by yourself.
You need to register on singularity-hub_ and link your singularity-hub account to your github account, then
.. _singularity-hub: https://www.singularity-hub.org/
1 2 3 4 5 | |
At this point, robots from singularity-hub will build the container for you, you will be able to download your container from the website or directly with:
1 | |
Example: Recipe with openai gym, mujoco and miniworld¶
Here is an example on how you can use singularity recipe to install complex environment as opanai gym, mujoco and miniworld on a pytorch based container.
In order to use mujoco, you'll need to copy the key stored on the mila cluster in /ai/apps/mujoco/license/mjkey.txt to your current directory.
```bash
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
Here is the same recipe but written for TensorFlow.
```bash
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
Keep in mind that those environment variables are sourced at runtime and not at build time. This is why, you should also define them in the %post section since they are required to install mujuco.
Using containers on clusters¶
On every cluster with SLURM, dataset and intermediate results should go in $SLURM_TMPDIR while the final experiments results should go in $SCRATCH.
In order to use the container you built, you need to copy it on the cluster you want to use.
Then reserve a node with srun/sbatch, copy the container and your dataset on the node given by slurm (i.e in $SLURM_TMPDIR) and execute the code <YOUR_CODE> within the container <YOUR_CONTAINER> with:
1 | |
Remember that /dataset, /tmp_log and /final_log were created in the previous section. Now each time, we'll use singularity, we are
explicitly telling it to mount $SLURM_TMPDIR on the cluster's node in the folder /dataset inside the container with the option -B such that
each dataset downloaded by pytorch in /dataset will be available in $SLURM_TMPDIR.
This will allow us to have code and scripts that are invariant to the cluster environment. The option -H specify what will be the container's home. For example,
if you have your code in $HOME/Project12345/Version35/ you can specify -H $HOME/Project12345/Version35:/home, thus the container will only have access to
the code inside Version35.
If you want to run multiple commands inside the container you can use:
1 | |
Example: Interactive case (srun/salloc)¶
Once you get an interactive session with slurm, copy <YOUR_CONTAINER> and <YOUR_DATASET> to $SLURM_TMPDIR
1 2 3 4 5 6 | |
then use singularity shell to get a shell inside the container
1 2 3 4 5 6 7 8 9 10 | |
or use singularity exec to execute <YOUR_CODE>.
1 2 3 4 5 6 7 8 | |
You can create also the following alias to make your life easier.
1 2 3 4 5 6 | |
This will allow you to run any code with:
1 | |
Example: sbatch case¶
You can also create a sbatch script:
```bash :linenos:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Issue with PyBullet and OpenGL libraries¶
If you are running certain gym environments that require pyglet, you may encounter a problem when running your singularity instance with the Nvidia drivers using the --nv flag. This happens because the --nv flag also provides the OpenGL libraries:
```bash
1 2 | |
If you don't experience those problems with pyglet, you probably don't need to address this. Otherwise, you can resolve those problems by apt-get install -y libosmesa6-dev mesa-utils mesa-utils-extra libgl1-mesa-glx, and then making sure that your LD_LIBRARY_PATH points to those libraries before the ones in /.singularity.d/libs.
```bash
1 2 3 | |
Mila cluster¶
On the Mila cluster $SCRATCH is not yet defined, you should add the
experiment results you want to keep in /network/scratch/<u>/<username>/. In
order to use the sbatch script above and to match other cluster environment's
names, you can define $SCRATCH as an alias for
/network/scratch/<u>/<username> with:
1 | |
Then, you can follow the general procedure explained above.
Digital Research Alliance of Canada¶
Using singularity on Digital Research Alliance of Canada is similar except that
you need to add Yoshua's account name and load singularity. Here is an example
of a sbatch script using singularity on compute Canada cluster:
```bash :linenos:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |