Portability concerns and solutions
When working on a software project, it is important to be aware of all the software and libraries the project relies on and to list them explicitly and under a version control system in such a way that they can easily be installed and made available on different systems. The upsides are significant:
Easily install and run on the cluster
Ease of collaboration
Better reproducibility
To achieve this, try to always keep in mind the following aspects:
Versions: For each dependency, make sure you have some record of the specific version you are using during development. That way, in the future, you will be able to reproduce the original environment which you know to be compatible. Indeed, the more time passes, the more likely it is that newer versions of some dependency have breaking changes. The
pip freezecommand can create such a record for Python dependencies.Isolation: Ideally, each of your software projects should be isolated from the others. What this means is that updating the environment for project A should not update the environment for project B. That way, you can freely install and upgrade software and libraries for the former without worrying about breaking the latter (which you might not notice until weeks later, the next time you work on project B!) Isolation can be made easy using UV, as well as Python Virtual environments and, as a last resort, Containers.
Virtual environments
A virtual environment in Python is a local, isolated environment in which you can install or uninstall Python packages without interfering with the global environment (or other virtual environments). It usually lives in a directory (location varies depending on whether you use venv, conda or poetry). In order to use a virtual environment, you have to activate it. Activating an environment essentially sets environment variables in your shell so that:
pythonpoints to the right Python version for that environment (different virtual environments can use different versions of Python!)pythonlooks for packages in the virtual environmentpip installinstalls packages into the virtual environmentAny shell commands installed via
pip installare made available
To run experiments within a virtual environment, you can simply activate it
in the script given to sbatch.
UV
In many cases, where your dependencies are Python packages, we highly recommend using UV, a modern package manager for Python.
In addition to all the same features as pip, it also manages Python installations, virtual environments, and makes your environments easier to reproduce and reuse across compute clusters.
Note
UV is not currently available as a module on the Mila or DRAC clusters at the time of writing. To use it, you first need to install it using this command on a cluster login node:
curl -LsSf https://astral.sh/uv/install.sh | sh
Pip/virtualenv command |
UV pip equivalent |
UV project command (recommended) |
|
|---|---|---|---|
Create your virtualenv |
|
||
Activate the virtualenv |
|
(same) |
(same, but often unnecessary) |
Install a package |
activate venv then |
||
Run a command
(ex. |
|
|
|
Where are dependencies declared? |
Maybe in a |
Maybe in a |
|
Easy to change Python versions? |
No |
somewhat |
Yes: |
While you can use UV as a drop-in replacement for pip, we recommend adopting a project-based workflow:
Use uv init to create a new project. A
pyproject.tomlfile will be created. This is where your dependencies are listed.uv init --python=3.12Use uv add to add (and uv remove to remove) dependencies to your project. This will update the
pyproject.tomlfile and update the virtual environment.uv add torch- Use uv run to run commands, for example
uv run python train.py. This will automatically do the following: Create or update the virtualenv (with the correct Python version) if necessary, based the dependencies in
pyproject.toml.Activates the virtualenv.
Runs the command you provided, e.g.
python train.py.
uv run python main.py
- Use uv run to run commands, for example
Pip/Virtualenv
Pip is the most widely used package manager for Python and each cluster provides several Python versions through the associated module which comes with pip. In order to install new packages, you will first have to create a personal space for them to be stored. The usual solution (as it is the recommended solution on Digital Research Alliance of Canada clusters) is to use virtual environments, although UV is now the recommended way to manage Python installations, virtual environments and dependencies.
Note
We recommend you use UV to manage your Python virtual environments instead of doing it manually. The next section will give an overview of how to install it and use it.
First, load the Python module you want to use:
module load python/3.8
Then, create a virtual environment in your home directory:
python -m venv $HOME/<env>
Where <env> is the name of your environment. Finally, activate the environment:
source $HOME/<env>/bin/activate
You can now install any Python package you wish using the pip command, e.g.
pytorch:
pip install torch torchvision
Or Tensorflow:
pip install tensorflow-gpu
Conda
Another solution for Python is to use miniconda or anaconda which are also available through the module
command: (the use of Conda is not recommended for Digital Research Alliance of
Canada clusters due to the availability of custom-built packages for pip)
module load miniconda/3
=== Module miniconda/3 loaded ===]
o enable conda environment functions, first use:
To create an environment (see here for details) using a specific Python version, you may write:
conda create -n <env> python=3.9
Where <env> is the name of your environment. You can now activate it by doing:
conda activate <env>
You are now ready to install any Python package you want in this environment. For instance, to install PyTorch, you can find the Conda command of any version you want on pytorch’s website, e.g:
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
If you make a lot of environments and install/uninstall a lot of packages, it can be good to periodically clean up Conda’s cache:
conda clean -it
Mamba
When installing new packages with conda install, conda uses a built-in
dependency solver for solving the dependency graph of all packages (and their
versions) requested such that package dependency conflicts are avoided.
In some cases, especially when there are many packages already installed in a conda environment, conda’s built-in dependency solver can struggle to solve the dependency graph, taking several to tens of minutes, and sometimes never solving. In these cases, it is recommended to try libmamba.
To install and set the libmamba solver, run the following commands:
# Install miniconda
# (you can not use the preinstalled anaconda/miniconda as installing libmamba
# requires ownership over the anaconda/miniconda install directory)
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_22.11.1-1-Linux-x86_64.sh
bash Miniconda3-py310_22.11.1-1-Linux-x86_64.sh
# Install libmamba
conda install -n base conda-libmamba-solver
By default, conda uses the built-in solver when installing packages, even after
installing other solvers. To try libmamba once, add --solver=libmamba in
your `conda install` command. For example:
conda install tensorflow --solver=libmamba
You can set libmamba as the default solver by adding solver: libmamba
to your .condarc configuration file located under your $HOME directory.
You can create it if it doesn’t exist. You can also run:
conda config --set solver libmamba
Using Modules
A lot of software, such as Python and Conda, is already compiled and available on
the cluster through the module command and its sub-commands. In particular,
if you wish to use Python 3.7 you can simply do:
module load python/3.7
The module command
For a list of available modules, simply use:
module avail
-------------------------------------------------------------------------------------------------------------- Global Aliases ---------------------------------------------------------------------------------------------------------------
cuda/10.0 -> cudatoolkit/10.0 cuda/9.2 -> cudatoolkit/9.2 pytorch/1.4.1 -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.4.1 tensorflow/1.15 -> python/3.7/tensorflow/1.15
cuda/10.1 -> cudatoolkit/10.1 mujoco-py -> python/3.7/mujoco-py/2.0 pytorch/1.5.0 -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.0 tensorflow/2.2 -> python/3.7/tensorflow/2.2
cuda/10.2 -> cudatoolkit/10.2 mujoco-py/2.0 -> python/3.7/mujoco-py/2.0 pytorch/1.5.1 -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.1
cuda/11.0 -> cudatoolkit/11.0 pytorch -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.1 tensorflow -> python/3.7/tensorflow/2.2
cuda/9.0 -> cudatoolkit/9.0 pytorch/1.4.0 -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.4.0 tensorflow-cpu/1.15 -> python/3.7/tensorflow/1.15
-------------------------------------------------------------------------------------------------- /cvmfs/config.mila.quebec/modules/Core ---------------------------------------------------------------------------------------------------
Mila (S,L) anaconda/3 (D) go/1.13.5 miniconda/2 mujoco/1.50 python/2.7 python/3.6 python/3.8 singularity/3.0.3 singularity/3.2.1 singularity/3.5.3 (D)
anaconda/2 go/1.12.4 go/1.14 (D) miniconda/3 (D) mujoco/2.0 (D) python/3.5 python/3.7 (D) singularity/2.6.1 singularity/3.1.1 singularity/3.4.2
------------------------------------------------------------------------------------------------ /cvmfs/config.mila.quebec/modules/Compiler -------------------------------------------------------------------------------------------------
python/3.7/mujoco-py/2.0
-------------------------------------------------------------------------------------------------- /cvmfs/config.mila.quebec/modules/Cuda ---------------------------------------------------------------------------------------------------
cuda/10.0/cudnn/7.3 cuda/10.0/nccl/2.4 cuda/10.1/nccl/2.4 cuda/11.0/nccl/2.7 cuda/9.0/nccl/2.4 cudatoolkit/9.0 cudatoolkit/10.1 cudnn/7.6/cuda/10.0/tensorrt/7.0
cuda/10.0/cudnn/7.5 cuda/10.1/cudnn/7.5 cuda/10.2/cudnn/7.6 cuda/9.0/cudnn/7.3 cuda/9.2/cudnn/7.6 cudatoolkit/9.2 cudatoolkit/10.2 cudnn/7.6/cuda/10.1/tensorrt/7.0
cuda/10.0/cudnn/7.6 (D) cuda/10.1/cudnn/7.6 (D) cuda/10.2/nccl/2.7 cuda/9.0/cudnn/7.5 (D) cuda/9.2/nccl/2.4 cudatoolkit/10.0 cudatoolkit/11.0 (D) cudnn/7.6/cuda/9.0/tensorrt/7.0
------------------------------------------------------------------------------------------------ /cvmfs/config.mila.quebec/modules/Pytorch --------------------------------------------------------------------------------------------------
python/3.7/cuda/10.1/cudnn/7.6/pytorch/1.4.1 python/3.7/cuda/10.1/cudnn/7.6/pytorch/1.5.1 (D) python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.0
python/3.7/cuda/10.1/cudnn/7.6/pytorch/1.5.0 python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.4.1 python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.1 (D)
----------------------------------------------------------------------------------------------- /cvmfs/config.mila.quebec/modules/Tensorflow ------------------------------------------------------------------------------------------------
python/3.7/tensorflow/1.15 python/3.7/tensorflow/2.0 python/3.7/tensorflow/2.2 (D)
Modules can be loaded using the load command:
module load <module>
To search for a module or a software, use the command spider:
module spider search_term
E.g.: by default, python2 will refer to the os-shipped installation of python2.7 and python3 to python3.6.
If you want to use python3.7 you can type:
module load python3.7
Available Software
Modules are divided in 5 main sections:
Section |
Description |
|---|---|
Core |
Base interpreter and software (Python, go, etc…) |
Compiler |
Interpreter-dependent software (see the note below) |
Cuda |
Toolkits, cudnn and related libraries |
Pytorch/Tensorflow |
Pytorch/TF built with a specific Cuda/Cudnn version for Mila’s GPUs (see the related paragraph) |
Note
Modules which are nested (../../..) usually depend on other software/module loaded alongside the main module. No need to load the dependent software, the complex naming scheme allows an automatic detection of the dependent module(s):
i.e.: Loading cudnn/7.6/cuda/9.0/tensorrt/7.0 will load cudnn/7.6 and
cuda/9.0 alongside
python/3.X is a particular dependency which can be served through
python/3.X or anaconda/3 and is not automatically loaded to let the
user pick his favorite flavor.
Default package location
Python by default uses the user site package first and packages provided by
module last to not interfere with your installation. If you want to skip
packages installed in your site-packages folder (in your /home directory), you
have to start Python with the -s flag.
To check which package is loaded at import, you can print package.__file__
to get the full path of the package.
Example:
module load pytorch/1.5.0
python -c 'import torch;print(torch.__file__)'
home/mila/my_home/.local/lib/python3.7/site-packages/torch/__init__.py <== package from your own site-package
Now with the -s flag:
module load pytorch/1.5.0
python -s -c 'import torch;print(torch.__file__)'
cvmfs/ai.mila.quebec/apps/x86_64/debian/pytorch/python3.7-cuda10.1-cudnn7.6-v1.5.0/lib/python3.7/site-packages/torch/__init__.py'
On using containers
Another option for creating portable code is Using containers on clusters.
Containers are a popular approach at deploying applications by packaging a lot of the required dependencies together. The most popular tool for this is Docker, but Docker cannot be used on the Mila cluster (nor the other clusters from Digital Research Alliance of Canada).
One popular mechanism for containerisation on a computational cluster is called Podman. This is the recommended approach for running containers on the Mila cluster. See section Using containers on clusters for more details.