Advanced SLURM usage and Multiple GPU jobs

Handling preemption

On the Mila cluster, jobs can preempt one-another depending on their priority (unkillable>high>low) (See the Slurm documentation)

The default preemption mechanism is to kill and re-queue the job automatically without any notice. To allow a different preemption mechanism, every partition have been duplicated (i.e. have the same characteristics as their counterparts) allowing a 120sec grace period before killing your job but don’t requeue it automatically: those partitions are referred by the suffix: -grace (main-grace, long-grace, main-cpu-grace, long-cpu-grace).

When using a partition with a grace period, a series of signals consisting of first SIGCONT and SIGTERM then SIGKILL will be sent to the SLURM job. It’s good practice to catch those signals using the Linux trap command to properly terminate a job and save what’s necessary to restart the job. On each cluster, you’ll be allowed a grace period before SLURM actually kills your job (SIGKILL).

The easiest way to handle preemption is by trapping the SIGTERM signal

#SBATCH --ntasks=1
#SBATCH ....

exit_script() {
    echo "Preemption signal, saving myself"
    trap - SIGTERM # clear the trap
    # Optional: sends SIGTERM to child/sub processes
    kill -- -$$
}

trap exit_script SIGTERM

# The main script part
python3 my_script

Note

Requeuing:
The Slurm scheduler on the cluster does not allow a grace period before
preempting a job while requeuing it automatically, therefore your job will
be cancelled at the end of the grace period.
To automatically requeue it, you can just add the sbatch command inside
your exit_script function.

Advanced SLURM usage and Multiple GPU jobs

Handling preemption

Packing jobs

Sharing a GPU between processes

Sharing a node with multiple GPU 1process/GPU

Sharing a node with multiple GPU & multiple processes/GPU