AMD ROCm 

This page documents how to deploy an EL Linux server with AMD Instinct GPUs using the ROCm HPC and AI software stack. See also this Wikipedia article.

Compilers and libraries 

HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.
Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops.
clMathLibraries
clBLAS software library containing BLAS functions written in OpenCL.

Software installation 

The ROCm_installation_for_Linux guide documents installation on RHEL. See also the Quick_Start guide.

Note: Use this matrix to view the ROCm compatibility across successive major and minor releases: https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html which contains a number of version requirements for software packages. For example, OpenMPI requires these communication framework versions to work with ROCm 6.2:

UCC >=1.3.0
UCX >=1.15.0

Install packages 

You must first enable the EPEL repository. Install kernel packages:

dnf install kernel-headers kernel-devel

Install ROCm RPMs as documented in Installation via native package manager. The Yum repositories amdgpu.repo, rocm.repo will be enabled for installation.

Now install:

dnf install amdgpu-dkms

and reboot the server.

Then install ROCm:

dnf install rocm

Finally follow the Post-installation instructions:

tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
ldconfig
export PATH=$PATH:/opt/rocm-6.2.2/bin
dkms status

Verify the ROCm installation:

/opt/rocm-6.2.2/bin/rocminfo
/opt/rocm-6.2.2/bin/clinfo

The software is installed under the /opt directory. Some useful commands are:

AMD ROCm System Management Interface (SMI) command:
```
/opt/rocm-X.X.X/bin/rocm-smi
```

The directory name depends on the installed versions of ROCm.

Add UNIX groups render and video 

Two new UNIX groups video,render should be created:

sudo usermod -a -G render,video $LOGNAME

and all ROCm users must be added to those groups. Note: If a modules user is used for building modules, this user must also be added to those groups.

On some Linuxes (unfortunately not EL8 Linux) you can configure all new users to have these groups by appending to the file /etc/default/useradd:

ADD_EXTRA_GROUPS=1
EXTRA_GROUPS=video
EXTRA_GROUPS=render

The section Setting Permissions for Groups states that a file /etc/adduser.conf should be created. However, such a file is not used by EL Linux installations.

Install ROCm runtimes 

The Quick_Start guide shows how to install the rocm-hip-libraries meta-package on EL8:

dnf install rocm-hip-libraries

Install the ROC_tracer library:

dnf install roctracer roctracer-devel

AMD ROCm

ROCm documentation

Compilers and libraries

Software installation

Install packages

Add UNIX groups render and video

Install ROCm runtimes