AMD ROCm

This page documents how to deploy an EL Linux server with AMD Instinct GPUs using the ROCm HPC and AI software stack. See also this Wikipedia article.

ROCm documentation

Compilers and libraries

  • HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.

  • Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops.

  • clMathLibraries

  • clBLAS software library containing BLAS functions written in OpenCL.

Software installation

The Deploy ROCm on Linux guide documents installation on RHEL. See also the Quick_Start guide. You must first enable the EPEL repository. Install kernel packages:

dnf install kernel-headers kernel-devel

Two new UNIX groups video,render should be created:

sudo usermod -a -G render,video $LOGNAME

and ROCm users must be added to those groups.

The section Setting Permissions for Groups states that a file /etc/adduser.conf should be created. However, such a file is not used by EL Linux installations.

Install ROCm RPMs as documented in Installation via Package manager. The Yum repositories amdgpu.repo, rocm.repo will be enabled for installation.

Now install:

dnf install amdgpu-dkms

and reboot the server.

It may be a good idea to install Multi-version of the ROCm stack on a system, for example:

dnf install rocm-hip-sdk5.7.2 rocm-hip-sdk5.6.1

Read also the Linux_installation section Post-install Actions and Verification Process.

The software is installed under the /opt directory. Some useful commands are:

  • AMD ROCm System Management Interface (SMI) command:

    /opt/rocm-5.7.2/bin/rocm-smi
    

The directory name depends on the installed versions of ROCm.

Install ROCm runtimes

The Quick_Start guide shows how to install the rocm-hip-libraries meta-package on EL8:

dnf install rocm-hip-libraries

Install the ROC_tracer library:

dnf install roctracer roctracer-devel