Portability concerns and solutions

When working on a software project, it is important to be aware of all the software and libraries the project relies on and to list them explicitly and under a version control system in such a way that they can easily be installed and made available on different systems. The upsides are significant:

  • Easily install and run on the cluster

  • Ease of collaboration

  • Better reproducibility

To achieve this, try to always keep in mind the following aspects:

  • Versions: For each dependency, make sure you have some record of the specific version you are using during development. That way, in the future, you will be able to reproduce the original environment which you know to be compatible. Indeed, the more time passes, the more likely it is that newer versions of some dependency have breaking changes. The pip freeze command can create such a record for Python dependencies.

  • Isolation: Ideally, each of your software projects should be isolated from the others. What this means is that updating the environment for project A should not update the environment for project B. That way, you can freely install and upgrade software and libraries for the former without worrying about breaking the latter (which you might not notice until weeks later, the next time you work on project B!) Isolation can be made easy using UV, as well as Python Virtual environments and, as a last resort, Containers.

Virtual environments

A virtual environment in Python is a local, isolated environment in which you can install or uninstall Python packages without interfering with the global environment (or other virtual environments). It usually lives in a directory (location varies depending on whether you use venv, conda or poetry). In order to use a virtual environment, you have to activate it. Activating an environment essentially sets environment variables in your shell so that:

  • python points to the right Python version for that environment (different virtual environments can use different versions of Python!)

  • python looks for packages in the virtual environment

  • pip install installs packages into the virtual environment

  • Any shell commands installed via pip install are made available

To run experiments within a virtual environment, you can simply activate it in the script given to sbatch.

UV

In many cases, where your dependencies are Python packages, we highly recommend using UV, a modern package manager for Python.

In addition to all the same features as pip, it also manages Python installations, virtual environments, and makes your environments easier to reproduce and reuse across compute clusters.

Note

UV is not currently available as a module on the Mila or DRAC clusters at the time of writing. To use it, you first need to install it using this command on a cluster login node:

curl -LsSf https://astral.sh/uv/install.sh | sh

Pip/virtualenv command

UV pip equivalent

UV project command (recommended)

Create your virtualenv

module load python/3.10 then python -m venv

uv venv

uv init and uv sync

Activate the virtualenv

. .venv/bin/activate

(same)

(same, but often unnecessary)

Install a package

activate venv then pip install

uv pip install

uv add

Run a command (ex. python main.py)

module load python, then . <venv>/bin/activate, then python main.py

. <venv>/bin/activate, then python main.py

uv run python main.py

Where are dependencies declared?

Maybe in a requirements.txt, setup.py or pyproject.toml

Maybe in a requirements.txt, setup.py or pyproject.toml

pyproject.toml

Easy to change Python versions?

No

somewhat

Yes: uv python pin <version> or uv sync --python <version>

While you can use UV as a drop-in replacement for pip, we recommend adopting a project-based workflow:

  • Use uv init to create a new project. A pyproject.toml file will be created. This is where your dependencies are listed.

    uv init --python=3.12
    
  • Use uv add to add (and uv remove to remove) dependencies to your project. This will update the pyproject.toml file and update the virtual environment.

    uv add torch
    
  • Use uv run to run commands, for example uv run python train.py. This will automatically do the following:
    1. Create or update the virtualenv (with the correct Python version) if necessary, based the dependencies in pyproject.toml.

    2. Activates the virtualenv.

    3. Runs the command you provided, e.g. python train.py.

    uv run python main.py
    

Pip/Virtualenv

Pip is the most widely used package manager for Python and each cluster provides several Python versions through the associated module which comes with pip. In order to install new packages, you will first have to create a personal space for them to be stored. The usual solution (as it is the recommended solution on Digital Research Alliance of Canada clusters) is to use virtual environments, although UV is now the recommended way to manage Python installations, virtual environments and dependencies.

Note

We recommend you use UV to manage your Python virtual environments instead of doing it manually. The next section will give an overview of how to install it and use it.

First, load the Python module you want to use:

module load python/3.8

Then, create a virtual environment in your home directory:

python -m venv $HOME/<env>

Where <env> is the name of your environment. Finally, activate the environment:

source $HOME/<env>/bin/activate

You can now install any Python package you wish using the pip command, e.g. pytorch:

pip install torch torchvision

Or Tensorflow:

pip install tensorflow-gpu

Conda

Another solution for Python is to use miniconda or anaconda which are also available through the module command: (the use of Conda is not recommended for Digital Research Alliance of Canada clusters due to the availability of custom-built packages for pip)

module load miniconda/3
=== Module miniconda/3 loaded ===]
o enable conda environment functions, first use:

To create an environment (see here for details) using a specific Python version, you may write:

conda create -n <env> python=3.9

Where <env> is the name of your environment. You can now activate it by doing:

conda activate <env>

You are now ready to install any Python package you want in this environment. For instance, to install PyTorch, you can find the Conda command of any version you want on pytorch’s website, e.g:

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

If you make a lot of environments and install/uninstall a lot of packages, it can be good to periodically clean up Conda’s cache:

conda clean -it

Mamba

When installing new packages with conda install, conda uses a built-in dependency solver for solving the dependency graph of all packages (and their versions) requested such that package dependency conflicts are avoided.

In some cases, especially when there are many packages already installed in a conda environment, conda’s built-in dependency solver can struggle to solve the dependency graph, taking several to tens of minutes, and sometimes never solving. In these cases, it is recommended to try libmamba.

To install and set the libmamba solver, run the following commands:

# Install miniconda
# (you can not use the preinstalled anaconda/miniconda as installing libmamba
#  requires ownership over the anaconda/miniconda install directory)
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_22.11.1-1-Linux-x86_64.sh
bash Miniconda3-py310_22.11.1-1-Linux-x86_64.sh

# Install libmamba
conda install -n base conda-libmamba-solver

By default, conda uses the built-in solver when installing packages, even after installing other solvers. To try libmamba once, add --solver=libmamba in your `conda install` command. For example:

conda install tensorflow --solver=libmamba

You can set libmamba as the default solver by adding solver: libmamba to your .condarc configuration file located under your $HOME directory. You can create it if it doesn’t exist. You can also run:

conda config --set solver libmamba

Using Modules

A lot of software, such as Python and Conda, is already compiled and available on the cluster through the module command and its sub-commands. In particular, if you wish to use Python 3.7 you can simply do:

module load python/3.7

The module command

For a list of available modules, simply use:

module avail
-------------------------------------------------------------------------------------------------------------- Global Aliases ---------------------------------------------------------------------------------------------------------------
  cuda/10.0 -> cudatoolkit/10.0    cuda/9.2      -> cudatoolkit/9.2                                 pytorch/1.4.1       -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.4.1    tensorflow/1.15 -> python/3.7/tensorflow/1.15
  cuda/10.1 -> cudatoolkit/10.1    mujoco-py     -> python/3.7/mujoco-py/2.0                        pytorch/1.5.0       -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.0    tensorflow/2.2  -> python/3.7/tensorflow/2.2
  cuda/10.2 -> cudatoolkit/10.2    mujoco-py/2.0 -> python/3.7/mujoco-py/2.0                        pytorch/1.5.1       -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.1
  cuda/11.0 -> cudatoolkit/11.0    pytorch       -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.1    tensorflow          -> python/3.7/tensorflow/2.2
  cuda/9.0  -> cudatoolkit/9.0     pytorch/1.4.0 -> python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.4.0    tensorflow-cpu/1.15 -> python/3.7/tensorflow/1.15

-------------------------------------------------------------------------------------------------- /cvmfs/config.mila.quebec/modules/Core ---------------------------------------------------------------------------------------------------
  Mila       (S,L)    anaconda/3 (D)    go/1.13.5        miniconda/2        mujoco/1.50        python/2.7    python/3.6        python/3.8           singularity/3.0.3    singularity/3.2.1    singularity/3.5.3 (D)
  anaconda/2          go/1.12.4         go/1.14   (D)    miniconda/3 (D)    mujoco/2.0  (D)    python/3.5    python/3.7 (D)    singularity/2.6.1    singularity/3.1.1    singularity/3.4.2

------------------------------------------------------------------------------------------------ /cvmfs/config.mila.quebec/modules/Compiler -------------------------------------------------------------------------------------------------
  python/3.7/mujoco-py/2.0

-------------------------------------------------------------------------------------------------- /cvmfs/config.mila.quebec/modules/Cuda ---------------------------------------------------------------------------------------------------
  cuda/10.0/cudnn/7.3        cuda/10.0/nccl/2.4         cuda/10.1/nccl/2.4     cuda/11.0/nccl/2.7        cuda/9.0/nccl/2.4     cudatoolkit/9.0     cudatoolkit/10.1        cudnn/7.6/cuda/10.0/tensorrt/7.0
  cuda/10.0/cudnn/7.5        cuda/10.1/cudnn/7.5        cuda/10.2/cudnn/7.6    cuda/9.0/cudnn/7.3        cuda/9.2/cudnn/7.6    cudatoolkit/9.2     cudatoolkit/10.2        cudnn/7.6/cuda/10.1/tensorrt/7.0
  cuda/10.0/cudnn/7.6 (D)    cuda/10.1/cudnn/7.6 (D)    cuda/10.2/nccl/2.7     cuda/9.0/cudnn/7.5 (D)    cuda/9.2/nccl/2.4     cudatoolkit/10.0    cudatoolkit/11.0 (D)    cudnn/7.6/cuda/9.0/tensorrt/7.0

------------------------------------------------------------------------------------------------ /cvmfs/config.mila.quebec/modules/Pytorch --------------------------------------------------------------------------------------------------
  python/3.7/cuda/10.1/cudnn/7.6/pytorch/1.4.1    python/3.7/cuda/10.1/cudnn/7.6/pytorch/1.5.1 (D)    python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.0
  python/3.7/cuda/10.1/cudnn/7.6/pytorch/1.5.0    python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.4.1        python/3.7/cuda/10.2/cudnn/7.6/pytorch/1.5.1 (D)

----------------------------------------------------------------------------------------------- /cvmfs/config.mila.quebec/modules/Tensorflow ------------------------------------------------------------------------------------------------
  python/3.7/tensorflow/1.15    python/3.7/tensorflow/2.0    python/3.7/tensorflow/2.2 (D)

Modules can be loaded using the load command:

module load <module>

To search for a module or a software, use the command spider:

module spider search_term

E.g.: by default, python2 will refer to the os-shipped installation of python2.7 and python3 to python3.6. If you want to use python3.7 you can type:

module load python3.7

Available Software

Modules are divided in 5 main sections:

Section

Description

Core

Base interpreter and software (Python, go, etc…)

Compiler

Interpreter-dependent software (see the note below)

Cuda

Toolkits, cudnn and related libraries

Pytorch/Tensorflow

Pytorch/TF built with a specific Cuda/Cudnn version for Mila’s GPUs (see the related paragraph)

Note

Modules which are nested (../../..) usually depend on other software/module loaded alongside the main module. No need to load the dependent software, the complex naming scheme allows an automatic detection of the dependent module(s):

i.e.: Loading cudnn/7.6/cuda/9.0/tensorrt/7.0 will load cudnn/7.6 and cuda/9.0 alongside

python/3.X is a particular dependency which can be served through python/3.X or anaconda/3 and is not automatically loaded to let the user pick his favorite flavor.

Default package location

Python by default uses the user site package first and packages provided by module last to not interfere with your installation. If you want to skip packages installed in your site-packages folder (in your /home directory), you have to start Python with the -s flag.

To check which package is loaded at import, you can print package.__file__ to get the full path of the package.

Example:

module load pytorch/1.5.0
python -c 'import torch;print(torch.__file__)'
home/mila/my_home/.local/lib/python3.7/site-packages/torch/__init__.py   <== package from your own site-package

Now with the -s flag:

module load pytorch/1.5.0
python -s -c 'import torch;print(torch.__file__)'
cvmfs/ai.mila.quebec/apps/x86_64/debian/pytorch/python3.7-cuda10.1-cudnn7.6-v1.5.0/lib/python3.7/site-packages/torch/__init__.py'

On using containers

Another option for creating portable code is Using containers on clusters.

Containers are a popular approach at deploying applications by packaging a lot of the required dependencies together. The most popular tool for this is Docker, but Docker cannot be used on the Mila cluster (nor the other clusters from Digital Research Alliance of Canada).

One popular mechanism for containerisation on a computational cluster is called Podman. This is the recommended approach for running containers on the Mila cluster. See section Using containers on clusters for more details.