Distributed Training
Note
About these examples: (Click the arrow below to expand)
This section contains some minimal examples of how to run jobs on the Mila cluster. Each example is self-contained and can be run as-is directly on the cluster without error. Each example has the following structure:
job.sh
: SLURMsbatch
script. Can be launched withsbatch job.sh
.main.py
: Example python script.
Some examples are displayed as a difference with respect to a “base” example. For instance, the multi-gpu example is shown as a difference with respect to the single-gpu example.