Node profile description

Name

GPU

CPUs

Sockets

Cores/Socket

Threads/Core

Memory (GB)

TmpDisk (TB)

Arch

Slurm Features

Model

Mem

#

GPU Arch and Memory

GPU Compute Nodes

cn-a[001-011]

RTX8000

48

8

40

2

20

1

384

3.6

x86_64

turing,48gb

cn-b[001-005]

V100

32

8

40

2

20

1

384

3.6

x86_64

volta,nvlink,32gb

cn-c[001-040]

RTX8000

48

8

64

2

32

1

384

3

x86_64

turing,48gb

cn-g[001-029]

A100

80

4

64

2

32

1

1024

7

x86_64

ampere,nvlink,80gb

cn-i001

A100

80

4

64

2

32

1

1024

3.6

x86_64

ampere,80gb

cn-j001

A6000

48

8

64

2

32

1

1024

3.6

x86_64

ampere,48gb

cn-k[001-004]

A100

40

4

48

2

24

1

512

3.6

x86_64

ampere,nvlink,40gb

cn-l[001-091]

L40S

48

4

48

2

24

1

1024

7

x86_64

lovelace,48gb

cn-n[001-002]

H100

80

8

192

2

96

1

2048

35

x86_64

hopper,nvlink,80gb

DGX Systems

cn-d[001-002]

A100

40

8

128

2

64

1

1024

14

x86_64

ampere,nvlink,dgx,40gb

cn-d[003-004]

A100

80

8

128

2

64

1

2048

28

x86_64

ampere,nvlink,dgx,80gb

cn-e[002-003]

V100

32

8

40

2

20

1

512

7

x86_64

volta,nvlink,dgx,32gb

CPU Compute Nodes

cn-f[001-004]

32

1

32

1

256

10

x86_64

rome

cn-h[001-004]

64

2

32

1

768

7

x86_64

milan

cn-m[001-004]

96

2

48

1

1024

7

x86_64

sapphire

Special nodes and outliers

DGX A100

DGX A100 nodes are NVIDIA appliances with 8 NVIDIA A100 Tensor Core GPUs. Each GPU has either 40 GB or 80 GB of memory, for a total of 320 GB or 640 GB per appliance. The GPUs are interconnected via 6 NVSwitches which allow for 600 GB/s point-to-point bandwidth (unidirectional) and a full bisection bandwidth of 4.8 TB/s (bidirectional). See the table above for the specifications of each appliance.

In order to run jobs on a DGX A100 with 40GB GPUs, add the flags below to your Slurm commands:

--gres=gpu:a100:<number> --constraint="dgx&ampere"

In order to run jobs on a DGX A100 with 80GB GPUs, add the flags below to your Slurm commands:

--gres=gpu:a100l:<number> --constraint="dgx&ampere"