Create a AMI

You have two options for building a machine image. The first is the Deep Learning AMI. This is the easier option as there’s pre-built versions versions for popular frameworks like Tensorflow, PyTorch or MxNet.

The second is the parallelcluster-efa-gpu-preflight repository. This option allows you more control on the software stack, such as custom CUDA versions, container support via Pyxis and Enroot.

Here’s an example of the two different images. Pick one and proceed to either a. Deep Learning AMI or b. Custom AMI with Packer (Optional).

1 - Deep Learning AMI

For example, in the Ubuntu 20.04 Deep Learning AMI, we get the following software stack:

  • Supported EC2 Instances: P5, P4de, P4d, P3, G5, G3, G4dn
  • Operating System: Ubuntu 20.04
  • Compute Architecture: x86
  • Conda environments framework and python versions:
  • python3: Python 3.9
  • NVIDIA Driver: 535.54.03
  • NVIDIA CUDA11 stack:
    • CUDA, NCCL and cuDDN installation path: /usr/local/cuda-xx.x/
    • EFA Installer: 1.19.0
  • AWS OFI NCCL: 1.5.0-aws
  • System location: /usr/local/cuda-xx.x/efa
  • This is added to run NCCL tests located at /usr/local/cuda-11.8/efa/test-cuda-xx.x/

Also, PyTorch package comes with dynamically linked AWS OFI NCCL plugin as a conda package aws-ofi-nccl-dlc package as well and PyTorch will use that package instead of system AWS OFI NCCL.

  • NCCL Tests Location: /usr/local/cuda-xx.x/efa/test-cuda-xx.x/
  • EBS volume type: gp3

2 - Custom Packer AMI

The parallelcluster-efa-gpu-preflight repository includes the following software stack.