Operating an AI workload on a GPU machine requires the set up of kernel drivers and person house libraries from GPU distributors similar to AMD and NVIDIA. As soon as the driving force and software program are put in, to make use of AI frameworks similar to PyTorch and TensorFlow, one wants to make use of the right framework constructed towards the GPU goal. Often, the AI functions run on high of widespread AI frameworks and as such conceal the tedious set up steps. This text highlights the significance of the {hardware}, driver, software program, and frameworks for operating AI functions or workloads.
This text offers with the Linux working system, ROCm software program stack for AMD GPU, CUDA software program stack for NVIDIA GPU, and PyTorch for AI frameworks. Docker performs a essential half in mentioning your entire stack permitting the launch of assorted workloads in parallel.
The above diagram represents the AI software program stack on an 8*AMD GPU node.
The {hardware} layer consists of a node with the same old CPU, reminiscence, and many others. + the GPU gadgets. A node can have a single GPU machine. Larger AI fashions require numerous GPU reminiscence to load, and therefore, it’s common to make use of a couple of GPU in a node. The GPUs are interconnected by XGMI and NVLink. A cluster can have a number of such nodes and GPUs on one node can work together with GPUs on one other node. This interconnect is often by InfiniBand, Ethernet/ROCe. The GPU interconnect for use relies upon upon the underlying GPU {hardware}.
Set up of Kernel Driver
On the software program layer, the AMD GPU driver or NVIDIA GPU driver must be put in. It’s not unusual to put in your entire ROCm or CUDA software program package deal on the native host OS which incorporates the kernel driver. Since we’re going to use a Docker container to launch the AI workload, the person house ROCm or CUDA software program is redundant on the native host OS; however this permits us to check if the underlying kernel driver works properly or not by the person house instruments.
Launching ROCm or CUDA-Primarily based Docker Container
As soon as the GPU drivers are put in, ROCm or CUDA-based Docker photos can be utilized respectively for AMD and NVIDIA GPU nodes.
Varied Linux-flavor Docker photos are launched periodically by AMD and NVIDIA. This is likely one of the benefits of Dockerized functions as an alternative of operating functions on a local OS. We are able to have Ubuntu 22.04 host OS with GPU drivers put in after which launch Centos, Ubuntu 20.04-based Docker containers with totally different ROCm variations in parallel.
Launching ROCm-Primarily based Docker Container
ROCm Docker photos can be found right here. Examine for dev-ubuntu-22.04 right here.
docker run -it --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rocm/dev-ubuntu-22.04
The above command maps all of the GPU gadgets to the container. You can even entry particular GPUs (extra data at “Running ROCm Docker containers”).
As soon as the container is operating, examine if GPUs are listed.
You possibly can obtain the PyTorch code and construct it for AMD GPU. Extra directions on GitHub or you possibly can run any workload that has ROCm help.
Launching ROCm-Primarily based PyTorch Docker Container
If PyTorch isn’t required to be constructed from the supply (generally, it isn’t required to construct the PyTorch from the supply), one can immediately obtain the ROCm primarily based PyTorch Docker picture. Simply be certain the ROcm kernel drivers are put in after which launch the PyTorch-based containers.
PyTorch with ROCm help Docker photos may be discovered right here.
docker run -it --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rocm/pytorch
As soon as the container is operating, examine if GPUs are listed as described earlier.
Let’s strive a couple of code snippets from the PyTorch framework to examine GPUs, ROCm/hip model, and many others.
root@node:/var/lib/jenkins# python3
Python 3.10.14 (foremost, Mar 21 2024, 16:24:04) [GCC 11.2.0] on linux
Sort "help", "copyright", "credits" or "license" for extra info.
>>> import torch
>>> torch.__version__
'2.1.2+git70dfd51'
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
8
>>> torch.model.hip
'6.1.40091-a8dbc0c19'
Conclusion
In conclusion, this text highlights the significance of software program stack compatibility with the underlying GPU {hardware}. A mistaken number of software program stack on a selected GPU kind would possibly result in the utilization of the default machine (i.e., CPU), thereby underutilizing compute energy of the GPU.
Completely happy GPU programming!