========================
AMD ROCm
========================

.. Contents::

This page documents how to deploy an EL Linux server with AMD_ Instinct_ GPUs
using the ROCm_ HPC and AI software stack.
See also this `Wikipedia article <https://en.wikipedia.org/wiki/ROCm>`_.

.. _AMD: https://www.amd.com
.. _Instinct: https://www.amd.com/en/graphics/instinct-server-accelerators
.. _ROCm: https://www.amd.com/en/graphics/servers-solutions-rocm
.. _ROCm_for_HPC: https://www.amd.com/en/graphics/servers-solutions-rocm-hpc

ROCm documentation
======================

* AMD_ ROCm_Documentation_.

* `HowTo Material <https://rocmdocs.amd.com/en/latest/how_to/all.html>`_.
* GPU-Enabled_MPI_.

.. _ROCm_Documentation: https://rocmdocs.amd.com/en/latest/
.. _GPU-Enabled_MPI: https://rocmdocs.amd.com/en/latest/how_to/gpu_aware_mpi.html

Compilers and libraries
========================

* HIP_ is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.

* Numba_ is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops.

* clMathLibraries_
* clBLAS_ software library containing BLAS functions written in OpenCL.

.. _HIP: https://github.com/ROCm-Developer-Tools/HIP
.. _Numba: https://numba.readthedocs.io/en/stable/user/5minguide.html
.. _clMathLibraries: https://github.com/clMathLibraries/
.. _clBLAS: https://github.com/clMathLibraries/clBLAS

Software installation
=========================

The `Deploy ROCm on Linux <https://rocm.docs.amd.com/en/latest/deploy/linux/>`_
guide documents installation on RHEL.
See also the Quick_Start_ guide.
You must first enable the EPEL_ repository.
Install kernel packages::

  dnf install kernel-headers kernel-devel

Two new UNIX groups ``video,render`` should be created::

  sudo usermod -a -G render,video $LOGNAME

and ROCm_ users must be added to those groups.

The section `Setting Permissions for Groups <https://rocm.docs.amd.com/en/latest/deploy/linux/prerequisites.html#setting-permissions-for-groups>`_
states that a file ``/etc/adduser.conf`` should be created.
However, such a file is **not** used by EL Linux installations.

Install ROCm_ RPMs as documented in
`Installation via Package manager <https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/index.html>`_.
The Yum repositories ``amdgpu.repo, rocm.repo`` will be enabled for installation.

Now install::

  dnf install amdgpu-dkms

and reboot the server.

It may be a good idea to install 
`Multi-version <https://rocm.docs.amd.com/en/latest/deploy/linux/install_overview.html#installation-types>`_
of the ROCm stack on a system, for example::

  dnf install rocm-hip-sdk5.7.2 rocm-hip-sdk5.6.1

Read also the Linux_installation_ section *Post-install Actions and Verification Process*.

The software is installed under the ``/opt`` directory.
Some useful commands are:

* AMD ROCm System Management Interface (SMI) command::

    /opt/rocm-5.7.2/bin/rocm-smi 

The directory name depends on the installed versions of ROCm_.

.. _Linux_installation: https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html
.. _Quick_Start: https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html
.. _EPEL: https://docs.fedoraproject.org/en-US/epel/

Install ROCm runtimes
---------------------------

The Quick_Start_ guide shows how to install the ``rocm-hip-libraries`` meta-package on EL8::

  dnf install rocm-hip-libraries 

Install the ROC_tracer_ library::

  dnf install roctracer roctracer-devel

.. _ROC_tracer: https://rocm.docs.amd.com/projects/roctracer/en/latest/