AMD ROCm
This page documents how to deploy an EL Linux server with AMD Instinct GPUs using the ROCm HPC and AI software stack. See also this Wikipedia article.
ROCm documentation
Compilers and libraries
HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.
Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops.
clBLAS software library containing BLAS functions written in OpenCL.
Software installation
The ROCm_installation_for_Linux guide documents installation on RHEL. See also the Quick_Start guide. You must first enable the EPEL repository. Install kernel packages:
dnf install kernel-headers kernel-devel
Two new UNIX groups video,render
should be created:
sudo usermod -a -G render,video $LOGNAME
and ROCm users must be added to those groups.
The section Setting Permissions for Groups
states that a file /etc/adduser.conf
should be created.
However, such a file is not used by EL Linux installations.
Install ROCm RPMs as documented in
Installation via Package manager.
The Yum repositories amdgpu.repo, rocm.repo
will be enabled for installation.
Now install:
dnf install amdgpu-dkms
and reboot the server.
It may be a good idea to install Multi-version of the ROCm stack on a system, for example:
dnf install rocm-hip-sdk5.7.2 rocm-hip-sdk5.6.1
Read also the Linux_installation section Post-install Actions and Verification Process.
The software is installed under the /opt
directory.
Some useful commands are:
AMD ROCm System Management Interface (SMI) command:
/opt/rocm-5.7.2/bin/rocm-smi
The directory name depends on the installed versions of ROCm.
Install ROCm runtimes
The Quick_Start guide shows how to install the rocm-hip-libraries
meta-package on EL8:
dnf install rocm-hip-libraries
Install the ROC_tracer library:
dnf install roctracer roctracer-devel