Omni-Path network fabric by Cornelis Networks

The Niflheim cluster has a 100 Gbit/s Intel OmniPath high speed network fabric.

This page assumes a RHEL/CentOS 7 or 8 Linux system.

OmniPath software and documentation

From early 2021 the Intel Omni-Path product has been taken over by a spin-out company CornelisNetworks.

Download of software and documentation is available from CornelisNetworks:

There is a Omni-Path User Group OPUG for public discussions about OmniPath.

OmniPath switches

Please see the Omni-Path switches page.

There is (apparently) a requirement that the OPA node software version and the switch firmware version must not differ by more than 1 version. For example, software version 10.9 is compatible with switch firmware 10.8, but older firmware releases (10.7, etc.) are not supported or tested.

Hardware installation

Read the document Intel Omni-Path Host Fabric Interface Installation Guide, especially the section Hardware Installation.

The default BIOS setting must be that the PCIe speed is set to Auto (may vary with BIOS). For a PCIe Gen3 x16 adapter the PCIe bus speed should be 8 GT/s, whereas Gen2 speed would only be 5 GT/s. In older versions of this manual there was an incorrect requirement of Gen2 speed setting.

Please verify your adapter’s speed and PCIe width. This can be done from the Linux OS by ClusterShell:

clush -w <node-list> 'lspci -vvv -s 04:00.0 | grep LnkSta:' | dshbak -c

The output may look like:

LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Here the PCIe device ID is 04:00.0, you determine it by:

lspci | grep Omni-Path

Software installation

OmniPath HFI adapter hardware is supported with OPA software 10.11 on compute nodes with the following Intel and AMD processors:

Please note that older processors are not supported. However, the OmniPath adapter may well work on older Xeon servers, even though it’s not officially supported by Intel. For example, we have tested OmniPath on an old Sandy Bridge server successfully.

Also note that OmniPath software versions must be identical on all compute nodes, or at most differ by 1 minor version (such as 10.6 and 10.5). Documentation???

The following software installation packages are available for an Intel ® Omni-Path Fabric:

  • Intel Omni-Path Fabric Host Software: – This is the basic installation package that installs the Intel ® Omni-Path Fabric Host Software components needed to set up compute, I/O, and Service nodes with drivers, stacks, and basic tools for local configuration and monitoring.

  • Intel Omni-Path Fabric Suite (IFS) Software: – This installation package provides special features and includes the Intel ® Omni-Path Fabric Host Software package, along with the Intel ® Omni-Path Fabric Suite FastFabric Toolset (FastFabric) and the Intel ® Omni-Path Fabric Suite Fabric Manager (Fabric Manager).

  • Intel Omni-Path Fabric Suite Fabric Manager GUI (Fabric Manager GUI): – This installation package provides a set of features for viewing and monitoring the fabric or multiple fabrics, and is installed on a computer outside of the fabric.

Operating Systems supported are listed in the Release Notes. Intel Omni-Path Architecture (OPA) host software is fully supported in Red Hat Enterprise Linux 7.6, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/7.6_release_notes/#BZ1627126

Download the latest version Intel® Omni-Path Fabric Software (Including Intel® Omni-Path HFI Driver) from the OmniPath_software page:

  • IntelOPA-Basic.RHEL7*-x86_64.10.X.*.tgz for compute nodes.

  • IntelOPA-IFS.RHEL7*-x86_64.10.X.*.tgz for the management node.

Read the Intel® Omni-Path Fabric Software Installation Guide from the Publications page.

Cornelis Omni-Path software 10.11

The current Omni-Path software release from CornelisNetworks as of February 2022 is release 10.11.1.3.1 with support for RHEL 8.5 and 7.8.

NOTICE:

  • The RHEL 8.5 (and clones thereof such as AlmaLinux and RockyLinux) fortunately include all required kernel modules as well as some (almost) up-to-date OPA software!

  • Therefore there is no requirement to install kmod-ifs-kernel-updates kernel modules any longer, which is necessary with RHEL/CentOS 7 and RHEL/CentOS 8 up to 8.3.

The RHEL 8.4 Release Notes introduced Omni-Path support:

  • Cornelis Omni-Path Architecture (OPA) Host Software

    Omni-Path Architecture (OPA) host software is fully supported in Red Hat Enterprise Linux 8.4. OPA provides Host Fabric Interface (HFI) hardware with initialization and setup for high performance data transfers (high bandwidth, high message rate, low latency) between compute and I/O nodes in a clustered environment.

    For instructions on installing Omni-Path Architecture, see: Cornelis Omni-Path Fabric Software Release Notes file. (BZ#1960412)

Cornelis OPA DNF repo

Download the latest (RHEL 8.5) Cornelis Omni-Path Fabric OPXS (Formerly IFS) tar-ball file from the Release Library page, for example CornelisOPX-OPXS.RHEL85-x86_64.10.11.1.3.1.tgz.

Create a DNF repository in some selected directory by unpacking the tar-ball, for example:

cd /home/opt
tar xzf CornelisOPX-OPXS.RHEL85-x86_64.10.11.1.3.1.tgz

Create a repo file /etc/yum.repos.d/CornelisOPX.repo pointing to the above directory:

[CornelisOPX-Common]
name=CornelisOPX Common pkgs
gpgcheck=0
baseurl=file:///home/opt/CornelisOPX-OPXS.RHEL85-x86_64.10.11.1.3.1/repos/OPA_PKGS
enabled=1

The baseurl may alternatively point to an https/http server in your network.

On all nodes using this repo, make sure to clean the DNF cache:

dnf clean all

OPA software installation on EL8.5

For RHEL/CentOS 8 (and EL8 clones) see the Release Notes Table 8 with a list of RHEL 8.5 Distribution RPMs.

Install the following sets of packages:

  • IB/OPA Centric:

    dnf install ibacm libibumad libibverbs libnl3 librdmacm opensm-libs perftest qperf rdma-core rdma-core-devel
    
  • System Centric:

    dnf install atlas bc coreutils expat expect gcc-gfortran glibc json-c libgfortran libgomp libquadmath libstdc++ libstdc++-devel ncurses-compat-libs
    
  • Other/Generally Installed:

    dnf install bash irqbalance kmod libgcc perl perl-Getopt-Long perl-PathTools perl-Socket pkgconfig systemd systemd-libs kernel-modules-extra
    
  • Build Requirements:

    dnf install autoconf automake bison elfutils-libelf-devel expat-devel flex gcc-c++.x86_64 kernel-abi-whitelists kernel-rpm-macros libnl3-devel libpfm libtool libuuid-devel ncurses-devel numactl-devel opensm-libs openssl-devel tcl-devel
    

Also install these EL8 RPMs:

dnf install libibverbs-utils

Finally install the CornelisOPX packages from the repo configured above:

dnf install hfi1-diagtools-sw hfi1-firmware hfi1-firmware_debug ifs-kernel-updates-devel libfabric libfabric-devel libfabric-psm2 libfabric-verbs libpsm2 libpsm2-compat libpsm2-devel opa-address-resolution opa-basic-tools opaconfig opa-libopamgt opa-libopamgt-devel opa-scripts opa-fastfabric opa-fm

NOTE: The hfidiags package contains a script requiring /usr/bin/python2. On EL8 it is recommended to not install hfidiags and thereby avoid pulling in the obsolete and insecure python2 packages.

There is no need to install the CornelisOPX kernel module because EL8 from EL8.4 and onwards contains OPA support in the kernel. However, if you wish to install the very latest kernel module, build the package from the source RPM (version numbers may differ):

rpm -ivh CornelisOPX-OFA_DELTA.RHEL*/SRPMS/ifs-kernel-updates*.src.rpm
rpmbuild -bb $HOME/rpmbuild/SPECS/ifs-kernel-updates.spec
dnf install $HOME/rpmbuild/RPMS/x86_64/kmod-ifs-kernel-updates-XXXX.x86_64.rpm

Omni-Path software 10.9 and later

The Intel® Omni-Path Fabric Software from Release 10.9 (and later, such as 10.10) has been redesigned for better software package installation, and the instructions are consequently updated.

Read the following Release 10.9 documents:

  • Release Notes

  • Omni-Path Fabric Software Installation Guide

For RHEL/CentOS 7 and 8 some base OS prerequisite packages must be installed on login and compute nodes, see Intel ® Omni-Path Software Release Notes section OS RPMs Installation Prerequisites. The package list differs slightly with OS version.

For RHEL/CentOS 7 (EL7) see Table 9 RHEL 7.x Distribution RPMs*. For RHEL/CentOS 8 (and EL8 clones) see Table 10 RHEL 8.x Distribution RPMs*.

Install the following sets of packages:

  • IB/OPA Centric:

    EL7: yum install ibacm infinipath-psm libibumad libibverbs libnl3 librdmacm opensm-libs perftest qperf rdma-core rdma-core-devel
    EL8: dnf install ibacm libibumad libibverbs libnl3 librdmacm opensm-libs perftest qperf rdma-core rdma-core-devel
    

    For OpenMPI you may also require this package on EL7 systems:

    EL7: yum install infinipath-psm-devel
    
  • System Centric:

    EL7: yum install atlas bc coreutils expat expect gcc-gfortran glibc json-c libgfortran libgomp libquadmath libstdc++ libstdc++-devel
    EL8: dnf install atlas bc coreutils expat expect gcc-gfortran glibc json-c libgfortran libgomp libquadmath libstdc++ libstdc++-devel ncurses-compat-libs
    
  • Other/Generally Installed:

    EL7: yum install bash irqbalance kmod libgcc perl perl-Getopt-Long perl-PathTools perl-Socket pkgconfig python systemd systemd-libs
    EL8: dnf install bash irqbalance kmod libgcc perl perl-Getopt-Long perl-PathTools perl-Socket pkgconfig systemd systemd-libs kernel-modules-extra
    

    NOTE: The hfidiags package contains a script requiring /usr/bin/python2. On EL8 it is recommended to not install hfidiags and thereby avoid pulling in the obsolete python2 packages.

  • Build Requirements:

    EL7: yum install expat-devel json-c-devel libnl3-devel libpfm libuuid-devel numactl-devel opensm-libs openssl-devel sqlite-devel valgrind-devel
    EL8: dnf install autoconf automake bison elfutils-libelf-devel expat-devel flex gcc-c++.x86_64 kernel-abi-whitelists kernel-rpm-macros libnl3-devel libpfm libtool libuuid-devel ncurses-devel numactl-devel opensm-libs openssl-devel tcl-devel
    

WARNING: The running kernel version must also be identical to the newest kernel and kernel-devel RPMs installed on the system!

It is therefore strongly recommended NOT to update the kernel and kernel-devel RPMs until after the OPA software installation. The kernel-devel RPM must be installed during OS installation and must correspond to the running kernel during installation.

Installation with OPA Yum repository

For Intel Omni-Path software installation see Intel ® Omni-Path Fabric Software Installation Guide Rev. 11.0, December 2018., Chapter 4.0 Install the Intel ® Omni-Path Software.

See Intel ® Omni-Path Fabric Software Installation Guide Rev. 11.0, December 2018., Section 4.4 Install Using Linux Distribution Software Packages Provided by Intel*.

If you have HPE servers with OPA adapters, you can install OPA software using the HPE_Intel_OPA SDR repository. Instructions are provided in the section Installing Intel_OPA packages. The contents of the Yum repository can be browsed at https://downloads.linux.hpe.com/SDR/repo/intel_opa/

It is strongly recommended to use in stead the Yum package manager for installation, and this has become possible starting with release 10.9.

Create the Yum repo file /etc/yum.repos.d/IntelOPA.repo (NOTE: Replace RHEL77… by the current version):

[IntelOPA-Common]
name=IntelOPA Common pkgs
gpgcheck=0
baseurl=file:///home/opt/IntelOPA-Basic.RHEL77-x86_64.10.10.1.0.36/repos/OPA_PKGS
enabled=1

where the baseurl points to where you have unpacked the OPA files (a directory or an http web-site). Please note that from IntelOPA 10.9.3 the repo files are now in the repos/OPA_PKGS/ subdirectory.

IMPORTANT: Since CentOS 7 provides an older OPA software version 10.9, you must clean the Yum cache in order to pick up the IntelOPA packages:

yum clean all

The OPA package list can be learned from Appendix B Intel ® OPA Software Components to Packages Mapping. The above mentioned components correspond to these RPM packages:

Component

RPM package

opa_stack

opa-scripts kmod-ifs-kernel-updates

oftools

opa-basic-tools opa-address-resolution

intel_hfi

hfi1-firmware hfi1-firmware_debug libpsm2 libpsm2-devel libpsm2-compat libfabric libfabric-devel libfabric-psm libfabric-psm2 libfabric-verbs hfi1-diagtools-sw hfidiags

delta_ipoib

(none)

The complete installation list is now for EL7:

yum install opa-scripts kmod-ifs-kernel-updates
yum install opa-basic-tools opa-address-resolution
yum install hfi1-firmware hfi1-firmware_debug libpsm2 libpsm2-devel libpsm2-compat libfabric libfabric-devel libfabric-psm libfabric-psm2 libfabric-verbs hfi1-diagtools-sw hfidiags

and for EL8:

dnf install opa-scripts opa-basic-tools opa-address-resolution
dnf install hfi1-firmware hfi1-firmware_debug libpsm2 libpsm2-devel libpsm2-compat libfabric libfabric-devel libfabric-psm2 libfabric-verbs hfi1-diagtools-sw

NOTE: The hfidiags package requires python2 which is unwanted on EL8 systems.

A reboot is required after the OPA kernel module installation. It is OK to update the kernel and kernel-devel RPMs only after the OPA kernel module installation.

As with releases prior to 10.9, it is still possible to unpack the software tar-ball and install (only) the required OPA basic software (add -v or -vv to increase logging):

./INSTALL -i opa_stack -i intel_hfi -i delta_ipoib -i oftools

However, the above Yum installation method is preferred.

OPA kernel module gets replaced by “yum update”

The OPA kernel driver module delivered by the IntelOPA-Basic.RHEL77-x86_64.10.10.*.gz tar-ball is installed into the running kernel, and the installed driver version can be identified by:

# modinfo hfi1
filename:       /lib/modules/3.10.0-1062.9.1.el7.x86_64/extra/ifs-kernel-updates/hfi1.ko
version:        10.10.1.0
description:    Intel Omni-Path Architecture driver
license:        Dual BSD/GPL
retpoline:      Y
rhelversion:    7.7
srcversion:     CEDD1A5E5B37C71A40B745F
(lines deleted)

The important line is version: 10.10.1.0 which indicates that it is the Intel-supplied driver.

IMPORTANT: Even though the correct kernel module version is installed, this does not necessaily mean that it has been activated. You must compare the /sys/module file to the modinfo output to ensure that the identical source version is being used. This command is useful:

# echo -n "srcversion:     "; cat /sys/module/hfi1/srcversion; modinfo hfi1 | grep srcversion:
srcversion:     CEDD1A5E5B37C71A40B745F
srcversion:     CEDD1A5E5B37C71A40B745F

If the source versions differ, you have to reboot the node and compare the srcversion signatures again. See also How to find the version of a compiled kernel module?.

If you subsequently update the Linux kernel with:

yum update

the stock CentOS 7 (or RHEL 7) kernel OPA driver module replaces the Intel module!

This can be seen by:

# modinfo hfi1
filename:       /lib/modules/3.10.0-1062.12.1.el7.x86_64/kernel/drivers/infiniband/hw/hfi1/hfi1.ko.xz
description:    Intel Omni-Path Architecture driver
license:        Dual BSD/GPL
firmware:       hfi1_pcie.fw
firmware:       hfi1_sbus.fw
firmware:       hfi1_fabric.fw
firmware:       hfi1_dc8051.fw
retpoline:      Y
rhelversion:    7.7
srcversion:     69D92ED8B2436C4B5EF0318
alias:          pci:v00008086d000024F1sv*sd*bc*sc*i*
alias:          pci:v00008086d000024F0sv*sd*bc*sc*i*
depends:        rdmavt,ib_core,i2c-algo-bit
intree:         Y
vermagic:       3.10.0-1062.12.1.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        A0:A1:B3:38:BF:C1:AB:19:5F:0B:AF:17:1B:44:90:D1:AF:85:5C:A7
sig_hashalgo:   sha256
(lines deleted)

where the version information is now absent, and the module signature is by CentOS. You should compare the srcversion signatures again as shown above.

Reinstall OPA kernel module after every kernel update

As explained above, when a kernel update is installed with yum update, the CentOS OPA driver replaces the Intel OPA kernel module driver. Intel Support has acknowledged (February 2020) that this is the expected behavior, and no fix will be made.

The proper and correct solution to updating kernel module drivers would be to use DKMS Dynamic Kernel Module Support.

Therefore, every time your kernel is updated, it is recommended to unpack the OPA software tar-ball to the local root filesystem, for example:

cd /root
tar xzf <...>/IntelOPA-IFS.RHEL78-x86_64.10.11.0.0.577.tgz
cd /root/IntelOPA-IFS.RHEL78-x86_64.10.11.0.0.577

Then build and reinstall (only) the required OPA stack basic software (add -v or -vv to increase logging):

./INSTALL -i opa_stack

or alternatively just build the package from the source RPM (version numbers may differ):

rpm -ivh CornelisOPX-OFA_DELTA.RHEL*/SRPMS/ifs-kernel-updates*.src.rpm
rpmbuild -bb $HOME/rpmbuild/SPECS/ifs-kernel-updates.spec
yum install $HOME/rpmbuild/RPMS/x86_64/kmod-ifs-kernel-updates-XXXX.x86_64.rpm

Verify the OPA driver version (as above):

modinfo hfi1

A better procedure may be to run the ./INSTALL -i opa_stack on just a single OPA compute node which is running the latest Linux kernel. See the log-file /var/log/opa.log for the results.

The INSTALL script actually creates a useful kernel RPM package, for example:

/root/IntelOPA-Basic.RHEL77-x86_64.10.10.1.0.36/IntelOPA-OFA_DELTA.RHEL77-x86_64.10.10.1.0.37/RPMS/redhat-ES77/kmod-ifs-kernel-updates-3.10.0_1062.18.1.el7.x86_64-1919.x86_64.rpm

Copy the kmod-ifs-kernel-updates-3.10.0_1062.18.1.el7.x86_64-1919.x86_64.rpm file to your management server and distribute it to all OPA compute nodes running the same kernel.

On each compute node you just need to install this RPM:

yum install kmod-ifs-kernel-updates-3.10.0_1062.18.1.el7.x86_64-1919.x86_64.rpm

This actually also updates the boot initramfs file:

/boot/initramfs-3.10.0-1062.18.1.el7.x86_64.img

Verify as above the OPA kernel driver module:

modinfo hfi1

Using DKMS to install OPA kernel module automatically

WARNING: Experimental procedure - use at your own risk!

This is an experiment trying to use DKMS to build and install the OPA kernel module automatically:

  1. Unpack the IntelOPA drivers tar-ball to /tmp, for example:

    cd /tmp; tar xzf IntelOPA-IFS.RHEL77-x86_64.10.10.1.0.36.tgz
    
  2. Install the ifs-kernel-updates source RPM (destination is $HOME/rpmbuild):

    rpm -ivh CornelisOPX-OFA_DELTA.RHEL*/SRPMS/ifs-kernel-updates*.src.rpm
    

    Unpack the source tar-ball to /usr/src/:

    cd /usr/src
    tar xzf $HOME/rpmbuild/SOURCES/ifs-kernel-updates-3.10.0_1062.el7.x86_64.tgz
    
  3. Install the DKMS package from the EPEL repository:

    yum install epel-release
    yum install dkms
    

Read the man dkms manual page about usage of the command.

  1. Go to the source directory:

    cd /usr/src/ifs-kernel-updates-3.10.0_1062.el7.x86_64
    
  2. Create the file dkms.conf in this directory with the contents for the 4 different drivers:

    PACKAGE_NAME="ifs-kernel-updates"
    PACKAGE_VERSION="3.10.0_1062.el7.x86_64"
    BUILT_MODULE_NAME[0]="hfi1"
    DEST_MODULE_LOCATION[0]="/kernel/drivers/infiniband/hw/hfi1"
    BUILT_MODULE_NAME[1]="ib_ipoib"
    DEST_MODULE_LOCATION[1]="/kernel/drivers/infiniband/ulp/ipoib"
    BUILT_MODULE_NAME[2]="ib_qib"
    DEST_MODULE_LOCATION[2]="/kernel/drivers/infiniband/hw/qib"
    BUILT_MODULE_NAME[3]="rdmavt"
    DEST_MODULE_LOCATION[3]="/kernel/drivers/infiniband/sw/rdmavt"
    AUTOINSTALL="yes"
    
  3. Add the kernel module given by PACKAGE_NAME and PACKAGE_VERSION to DKMS and verify it:

    dkms add ifs-kernel-updates/3.10.0_1062.el7.x86_64
    dkms status
    
  4. Build the modules under DKMS control:

    dkms build ifs-kernel-updates/3.10.0_1062.el7.x86_64
    

    Append --verbose to print details.

  5. Install the modules under DKMS control:

    dkms install ifs-kernel-updates/3.10.0_1062.el7.x86_64
    

Omni-Path firmware tools

The Intel® Omni-Path firmware_tools package under the downloads of Intel® Omni-Path Unified Extensible Firmware Interface (UEFI) Firmware contains an RPM package:

yum install hfi1-firmware-tools-10.9.2.0-3.x86_64.rpm

which contains a utility which can query versions of files stored in the HFI adapter EPROM:

hfi1_eprom -V

Usage of this command is documented in chapter 5 of the Intel® Omni-Path Fabric Software Installation Guide.

It may be necessary to update the UEFI firmware using this command:

hfi1_eprom -d all -u /usr/share/opa/bios_images/*

where the bios_images have been installed by the hfi1-uefi RPM package.

In addition the Intel® Omni-Path Thermal Management Module TMM_Firmware should be upgraded. See man opatmmtool.

Install the TMM_Firmware RPM package:

yum install hfi1-tmm-10.9.0.0-208.noarch.rpm

This RPM provides the firmware file:

/lib/firmware/updates/hfi1_smbus.fw

Check the TMM_Firmware version by:

opatmmtool fwversion
opatmmtool -f /lib/firmware/updates/hfi1_smbus.fw fileversion

Usage of this command is documented in chapter 15 of the Intel® Omni-Path Fabric Software Installation Guide.

Firmware upgrade operation:

opatmmtool -f /lib/firmware/updates/hfi1_smbus.fw update
opatmmtool reboot    # Non-disruptive
opatmmtool fwversion

Fabric Manager software 10.9

The Intel® Omni-Path Fabric Software Installation Guide section 4.4 Install Using Linux Distribution Software Packages Provided by Intel* states:

  • An example of additional software that should be installed on Fabric Manager nodes is shown below:

    opa-fm opa-fm-debuginfo opa-fm-debuginfo opa-debuginfo opa-fastfabric opa-mpi-apps
    

See also the package list in Appendix B Intel® OPA Software Components to Packages Mapping.

IMPORTANT: Since CentOS 7.7 provides an older OPA software version 10.9, you must clean the Yum cache in order to pick up the IntelOPA packages:

yum clean all

It seems that this subset is sufficient for the OPA Fabric Manager:

yum install opa-fm opa-fastfabric

Omni-Path software 10.6, 10.7, 10.8

NOTICE: The software release 10.8 and older should now be considered as obsolete.

Install RHEL/CentOS 7 prerequisites

For RHEL/CentOS 7 the following base prerequisite packages must be installed on login and compute nodes. There are two distinct situations:

  1. The server contains only OmniPath adapters:

    yum install libibmad libibumad libibumad-devel libibverbs librdmacm libibcm libpfm.i686 ibacm qperf perftest rdma infinipath-psm infinipath-psm-devel libhfi1 expat elfutils-libelf-devel libstdc++-devel gcc-gfortran atlas tcl expect tcsh sysfsutils pciutils bc opensm-devel opensm-libs rpm-build redhat-rpm-config kernel-devel papi.i686
    
  2. The server contains both Mellanox Infiniband as well as OmniPath adapters. Go to the next section Mellanox OFED installation.

Mellanox OFED installation

In case the server contains both Mellanox Infiniband as well as OmniPath adapters, the required order of installation is:

  1. Install RHEL/CentOS 7 prerequisites:

    yum install expect tcl tk
    

The mlnxofedinstall will tell you if any prerequisites are missing.

  1. Install the Mellanox_OFED software before you install any OmniPath software. Read the Mellanox OFED for Linux User Manual and perform the software installation:

    mlnxofedinstall
    
  2. Install OmniPath software as described below.

Prevent yum update from overwriting Intel OPA packages

This section is relevant only for OPA software prior to 10.9.

As of Intel OPA software release 10.6 (late 2017), all Intel RPMs are still not installed with yum, but in stead by a brain-dead operation:

rpm -i --force --nodeps <rpm list>

Since all CentOS/RHEL 7 packages are installed with yum, the yum database contains no record of the Intel OPA RPMs.

When you subsequently update the OS by:

yum update

the CentOS/RHEL 7 OPA updates version 10.3 RPMs will replace several Intel OPA RPMs (opa-*) previously installed. This will of course cause havoc on your OPA installation.

Until Intel has solved this problem, it is mandatory to exclude all OPA RPM updates from the distribution by appending these rules to /etc/yum.conf:

exclude=opa-* libpsm2* libfabric* hfi1*

Install OPA IFS software on manager node

Follow the Intel® Omni-Path Fabric Software Installation Guide chapter 4.0 Install the Intel® Omni-Path Software for installation details.

On IFS servers also the following are required:

yum install libibverbs-devel libibmad-devel librdmacm-devel ibacm-devel openssl-devel libuuid-devel expat-devel valgrind-devel

Unpack the IntelOPA-IFS.<DISTRO>-x86_64.10.<version>.tgz tar-ball and run the INSTALL script (add -v or -vv to increase logging), for example for RHEL/CentOS 7.3:

tar xf IntelOPA-IFS.RHEL73-x86_64.10.3.0.0.81.tgz
cd IntelOPA-IFS.RHEL73-x86_64.10.3.0.0.81
./INSTALL

Select all appropriate softwares to be installed. The Fabric Manager node requires the FastFabric and OPA FM components besides the BASIC components, see the software installation guide chapter Upgrade from IntelOPA-Basic to IntelOPA-IFS

If the manager node should run the OPA Fabric Manager service, make sure to enable this Intel OPA Autostart item:

OPA FM (opafm)

The opafm can also be started using Systemd services:

systemctl enable opafm
systemctl start opafm

Alternatively, just run the CLI version to install the basic software manually as shown below. Then install and enable opafm and fastfabric:

./INSTALL -i opafm -i fastfabric -E opafm

You must make sure this host’s Static hostname is set correctly (not just localhost.localdomain):

hostnamectl
hostnamectl set-hostname <hostname>.<domainname>

The node must be rebooted after the install to activate new kernel modules and set the correct hostname.

NOTE: It is important to permit the installation to update of the file /etc/security/limits.conf with memory locking limits:

* hard memlock unlimited
* soft memlock unlimited

This file is read by PAM when users log in. However, system daemons started during the boot process do not use /etc/security/limits.conf, and the correct memory limits must be set inside the daemon startup scripts. This is especially important for batch job services.

Install OPA software on all nodes using opafastfabric

NOTE: This uses Intel’s installation tools, but you may alternatively use the manual installation method described below.

Follow the Intel® Omni-Path Fabric Software Installation Guide chapter 7.0 Install Host Software on the Remaining Hosts Using the FastFabric TUI. Run this on the manager node and select Host Setup:

opafastfabric

Run the following menu items in this order:

3) Host Setup
2) Set Up Password-Less SSH/SCP
1) Verify Hosts Pingable

The good nodes are listed in the file /etc/sysconfig/opa/good.

The tar-ball IntelOPA-BASIC.<DISTRO>-x86_64.10.<version>.tgz must be available on the Manager node for installation on the compute nodes. Now install the OPA software on all good nodes:

5) Install/Upgrade OPA Software
6) Configure IPoIB IP Address

At the end of the installation select to reboot the nodes:

8) Reboot Hosts

IPoIB device ib0 not present

We have seen an error when upgrading the OPA software stack from 10.2 to 10.3. The ib0 network interface is defined correctly in /etc/sysconfig/network-scripts/ifcfg-ib0, yet the ib0 network device doesn’t exist and an error is printed:

/etc/sysconfig/network-scripts/ifup-ib[3239]: Device ib0 does not seem to be present, delaying initialization.

The OPA software INSTALL file menu:

3) Reconfigure Driver Autostart
   3) OFA IP over IB   [Enable ]

will fix this error after a reboot.

This can also be done with the opaconfig command:

# opaconfig -E delta_ipoib
Configuring autostart for Selected installed OPA Drivers
Enabling autostart for OFA IP over IB
Done OPA Driver Autostart Configuration.

To verify ping over IPoIB connectivity, use a Manager node with the IFS software:

/usr/sbin/opahostadmin -f /etc/sysconfig/opa/allhosts ipoibping

Manual software installation on a single node

When individual compute nodes are installed from scratch, the OPA software must be installed from the CLI command line in the Kickstart post-install scripts. The Intel OPA documentation does not describe this procedure, so we have to discover it by trial-and-error.

Start by reading the manual Intel ® Omni-Path Fabric Software Installation Guide chapter 4.

The installation steps are:

  1. Copy the Basic tar-ball to the system root and unpack it:

    cp (some location)/IntelOPA-Basic.RHEL73-x86_64.10.3.0.0.81.tgz /root/
    tar xzf IntelOPA-Basic.RHEL73-x86_64.10.3.0.0.81.tgz
    cd IntelOPA-Basic.RHEL73-x86_64.10.3.0.0.81
    
  2. You can run the INSTALL TUI script to learn about menu items. Then install the basic software:

    ./INSTALL -i opa_stack -i intel_hfi -i delta_ipoib -i oftools
    

    The installation log will be in /var/log/opa.log.

  3. The PSM2 library libpsm2 is not installed by any of the above components, so install it manually:

    cd ./IntelOPA-OFED_DELTA.RHEL73-x86_64.10.3.0.0.82/RPMS/redhat-ES73
    yum install libpsm2-10.X*rpm libpsm2-devel*rpm
    

    It seems that the libpsm2-compat RPM is not needed because it conflicts with the required infinipath-psm RPM.

  4. The IPoIB network script /etc/sysconfig/network-scripts/ifcfg-ib0 must be edited manually, see the section IPoIB Configuration below.

NOTE: The INSTALL TUI script installs RPM packages not by using yum, but directly with the rpm command, for example as seen in /var/log/opa.log:

/bin/rpm -U --force --nodeps  ./IntelOPA-OFED_DELTA.RHEL73-x86_64.10.3.0.0.82/RPMS/redhat-ES73/kmod-ifs-kernel-updates-3.10.0_514.el7.x86_64-123.x86_64.rpm

It is not a good practice to install packages with --force --nodeps (forcing installation without checking for dependencies)! The RPMs installed will unfortunately not be logged to /var/log/yum.log as is the best practice.

Uninstallation of OPA software

To uninstall all OPA software use the INSTALL script option:

  • -u - uninstall all ULPs and drivers with default options

The command is:

./INSTALL -u

Installation of Fabric Manager GUI

For the GUI download the RPM package IntelOPA-FMGUI.linux-<VERSION>.noarch.rpm (or similar) and install with:

yum install IntelOPA-FMGUI.linux-10.3.0.0.60.noarch.rpm

Read the Intel Omni-Path Fabric Software Installation Guide chapter 14 Install Intel Omni-Path Fabric Suite Fabric Manager GUI. The file /etc/opa-fm/opafm.xml must be edited to enable running the GUI on localhost without SSL encryption:

<SslSecurityEnable>0</SslSecurityEnable>

Also enable the Fabric Executive (FE) component of the Fabric Manager:

<Start>1</Start> <!-- default FE startup for all instances -->

Furthermore, for redundant setups configure also the priorities described in the Redundant Fabric Manager hosts below.

Then restart the Fabric Manager:

systemctl restart opafm

Now run the GUI (a Java applet):

fmgui

Configure fmgui:

  • Enter the localhost hostname on the Fabric Manager server/node.

  • If you use a remote server, enter its hostname. You should also enable SSL.

The remote FM GUI requires port 3245 to be open on the Fabric Manager node, so you may have to open it in the firewall (if any):

firewall-cmd --zone=public --add-port=3245/tcp --permanent
firewall-cmd --reload

The next step is:

  • Menu item Subnet, select Connect To and click the network name you defined above.

Read the Intel Omni-Path Fabric Suite Fabric Manager GUI User Guide.

OPA kernel modules

During the above installation the INSTALL script installs a RPM package with OPA kernel modules. In /var/log/opa.log this is logged as:

installing kmod-ifs-kernel-updates-3.10.0_514.el7.x86_64-123.x86_64...
  /bin/rpm -U --force --nodeps  ./IntelOPA-OFED_DELTA.RHEL73-x86_64.10.3.0.0.82/RPMS/redhat-ES73/kmod-ifs-kernel-updates-3.10.0_514.el7.x86_64-123.x86_64.rpm

The source RPM file is:

./IntelOPA-OFED_DELTA.RHEL73-x86_64.10.3.0.0.82/SRPMS/ifs-kernel-updates-3.10.0_514.el7.x86_64-123.src.rpm

The RPM contains the following files:

# rpm -ql kmod-ifs-kernel-updates
/etc/depmod.d/ifs-kernel-updates.conf
/lib/modules/3.10.0-514.el7.x86_64/extra/ifs-kernel-updates/hfi1.ko
/lib/modules/3.10.0-514.el7.x86_64/extra/ifs-kernel-updates/rdmavt.ko

The problem with this package is that the kernel modules do not get updated when you update the Linux kernel! We are awaiting Intel’s response to this problem. One good method would be to use Dynamic Kernel Module Support (DKMS).

OPA configuration files

On the management node, the OPA configuration files are stored in this directory:

/etc/sysconfig/opa/

OPA srpd services

The service srpd (SCSI RDMA Protocol over InfiniBand) is not used on compute nodes, so turn it off:

systemctl stop srpd
systemctl disable srpd

Managing the OPA fabric

Read the Intel® Omni-Path Fabric Suite Fabric Manager User Guide.

Check the fabric

On each host you can verify the OPA HFI adapter revision by:

opahfirev

(installed by the opa-basic-tools RPM package).

Check the OPA link quality on a list of nodes using ClusterShell:

clush -w <node-list>  'opainfo  | grep Link' | dshbak -c

The opa-fastfabric RPM package (part of the IFS software package) contains a useful host checking script:

/usr/share/opa/samples/hostverify.sh    # From OPA software version 10.7

You may copy this from an IFS host to other hosts and run it. To see available options run:

hostverify.sh --help

Check the Fabric Manager (FM)

The OPA FM Fabric Manager was installed above on the Manager node. Manage the opafm service by:

systemctl status opafm
systemctl enable opafm
systemctl start opafm
systemctl restart opafm
systemctl stop opafm

The /usr/lib/opa-fm/bin/opafmctrl allows the user to manage the instances of the FM that are running after the opafm service has been started.

The OPA FM configuration file is /etc/opa-fm/opafm.xml. Other OPA configuration files are in /etc/sysconfig/opa/.

Fabric Manager commands

See chapter 8 of the FM user guide. Install the software by:

yum install opa-fastfabric

Some useful commands are:

  • opafmconfigcheck: Parses and verifies the configuration file of a Fabric Manager (FM). Displays debugging and status information.

  • opafabricinfo: Provides a brief summary of the components in the fabric.

  • opatop: Fabric Performance Monitor menu to display performance, congestion, and error information about a fabric.

  • opareport: Provides powerful fabric analysis and reporting capabilities.

  • opafmcmd: Executes a command to a specific instance of the Fabric Manager (FM).

The opareport command displays information about nodes and links in the fabric, see the man-page or the FM user guide. For example, to list the Master Subnet Manager host in the fabric:

opareport -F sm

To list also other subnet manager hosts, it is simpler to do:

opareport | tail

To display link problems:

opareport -o errors -o slowlinks
opareport --clear     # Clears the port counters

Requirement of setting static hostname

Unfortunately, the OPA driver by default use the hostname localhost.localdomain in stead of the node name obtained from DHCP.

Any hostname or SM Name fields from the opareport command are obtained as the host’s Static hostname, which by default is localhost.localdomain (see man hostnamectl and the file /etc/hostname).

This is rather inconvenient, so you must change the Static hostname using the correct hostname using one of these commands:

hostnamectl set-hostname <hostname>.<domainname>
hostnamectl set-hostname `hostname`

Then you have to reboot the system to reinitialize the OPA driver setup.

Redundant Fabric Manager hosts

You may want to run the FM on two hosts, an active Master and an Standby Slave FM.

The use of Redundant Fabric Manager hosts is described in the Intel ® Omni-Path Fabric Suite Fabric Manager User Guide section 4.1 Redundant FMs in a Fabric. See especially the section 4.1.2 Master FM Failover.

The configuration file /etc/opa-fm/opafm.xml must be consistent on all FM hosts, use the tool opafmconfigdiff to verify this.

NOTE: All FM hosts must be running the same exact same minor version of Omni-Path software.

The FM hosts priority is configured in /etc/opa-fm/opafm.xml by the parameters:

<!-- Priority and Elevated Priority control failover for SM, PM. -->
<!-- Priority is used during initial negotiation, higher Priority wins. -->
<!-- ElevatedPriority is assumed by winning master, this can prevent -->
<!-- fallback when previous master comes back on line.  -->
<Priority>8</Priority> <!-- 0 to 15, higher wins -->
<ElevatedPriority>15</ElevatedPriority> <!-- 0 to 15, higher wins -->

The Priority settings must be:

  • For the master set Priority=8

  • For the slaves set Priority=1

There are some points to note in the documentation:

  • The first opafm service running on a host will be the master.

  • When several hosts/switches run opafm, an election will decide the master.

  • Any switches running a FM instance will have a lower priority and yield to a host-based master.

  • If the master’s opafm is stopped, one of the standby slaves will become the new master after some timeout.

  • One can flexibly add and remove opafm hosts, as long as there is one host/switch who will be the master.

Check the FM servers by:

opareport | tail

which should show a status similar to:

2 Connected SMs in Fabric:
State       GUID               Name
Standby     0x00117501010963d1 server2 hfi1_0
Master      0x0011750101097827 server1 hfi1_0

Inactive secondary fabric manager

Note: The secondary server must have a status of Standby, for example:

$ opareport | tail
...
2 Connected SMs in Fabric:
State       GUID               Name
Standby     0x00117501010963d1 server2 hfi1_0
Master      0x0011750101097827 server1 hfi1_0

An Inactive state indicates an error condition which you must resolve.

Typically, this is due to a version mismatch between the OPA softwares on the master and standby nodes. This may occur, for example, if you inadvertently install the OS distro RPM packages in stead of the Intel OPA packages. Check the OPA FM package versions by:

rpm -q opa-fm opa-fastfabric

OPA cable errors

See https://www.intel.com/content/www/us/en/support/articles/000029514/network-and-io/fabric-products.html

Create a fabric log file:

opacapture -d 3 <filename>

After the command has completed, attach the tgz file to your troubleshooting case.

Intel PSM2 Sample Program

To verify the basic functionality of the OmniPath network, copy the Intel® PSM2 Sample Program code from the PDF documentation file Intel® Performance Scaled Messaging 2 (PSM2) Programmer’s Guide in Intel’s End User Publications web page. We attach the file psm2-demo.c for convenience.

Make sure the PSM2 packages have been installed:

rpm -q libpsm2 libpsm2-devel

and compile the code:

gcc psm2-demo.c -o psm2-demo -lpsm2

Now run two instances (server and client) on the same or different nodes:

./psm2-demo -s  # Server
./psm2-demo     # Client

If you get an error, see the Memory limits section below.

OpenMPI configuration

Optimized performance with OPA requires the PSM2 interface, see https://www.open-mpi.org/faq/?category=building#build-p2p. Search in the Intel documentation (link at the top of this page) for the document entitled Intel® Performance Scaled Messaging 2 (PSM2) Programmer’s Guide.

Intel ® Performance Scaled Messaging 2 (PSM2) is only available on RHEL/CentOS 7.2 or later, see https://github.com/01org/opa-psm2/blob/master/README which states:

Building PSM2 is possible on RHEL 7.2 as it ships with hfi1 kernel driver.

On CentOS 7 you must have these prerequisite packages, which are installed as above by the Intel OPA software:

rpm -q libpsm2 libpsm2-devel

If you get OpenMPI runtime errors like:

mca: base: components_open: component pml / cm open function failed

then you may need to install also these packages before building OpenMPI:

yum install infinipath-psm infinipath-psm-devel

see [OMPI users] Issue about cm PML and rocks 6.2 infiniband.

Omni-Path and OpenMPI version 4.1.4 (and possibly later versions) need special care:

The workaround is to set this environment variable:

export OMPI_MCA_btl=^openib,ofi

Build OpenMPI on RHEL/CentOS 7.2 (or later) with the configuration flags:

--with-psm2=/usr # Build support for the PSM 2 library (starting with the v1.10 series).

Note however in the RHEL7.2 Release Notes the following section:

  • PSM2 MTL disabled to avoid conflicts between PSM and PSM2 APIs:

    The new libpsm2 package provides the PSM2 API for use with Intel Omni-Path devices, which overlaps with the Performance Scaled Messaging (PSM) API installed by the infinipath-psm package for use with Truescale devices. The API overlap results in undefined behavior when a process links to libraries provided by both packages. This problem affects Open MPI if the set of its enabled MCA modules includes the psm2 Matching Transport Layer (MTL) and one or more modules that directly or indirectly depend on the libpsm_infinipath.so.1 library from the infinipath-psm package.

The older PSM library is not available on CentOS 7:

--with-psm=<dir> # Build support for the PSM library.

Intel OpenMPI

The IntelOPA-Basic.RHEL73-x86_64.10.3.0.0.81 package contains Intel’s builds of OpenMPI using the GCC compiler. Install the hfi versions of RPMs to use OmniPath, for example:

cd IntelOPA-OFED_DELTA.RHEL73-x86_64.10.3.0.0.82/RPMS/redhat-ES73
install openmpi_gcc_hfi-1.10.4-9.x86_64.rpm mpi-selector-1.0.3-1.x86_64.rpm mpitests_openmpi_gcc_hfi-3.2-930.x86_64.rpm

To use the Intel OpenMPI see the Intel Omni-Path Fabric Performance Tuning User Guide chapter 5 MPI Performance:

  • Load the environment variables:

    source /usr/mpi/gcc/openmpi-1.10.4-hfi/bin/mpivars.sh
    
  • Use the options in your mpirun command to specify the use of PSM2 with OpenMPI:

    mpirun -mca pml cm -mca mtl psm2 ...
    

Using OpenMPI with OmniPath

First make the correct version of OpenMPI available to your applications. If you use software modules (see the EasyBuild software for environment modules page) load the appropriate module, for example:

 # module load foss
 # module list
 Currently Loaded Modules:
 1) EasyBuild/3.0.1
 2) GCCcore/5.4.0
 3) binutils/2.26-GCCcore-5.4.0
 4) GCC/5.4.0-2.26
 5) numactl/2.0.11-GCC-5.4.0-2.26
 6) hwloc/1.11.3-GCC-5.4.0-2.26
 7) OpenMPI/1.10.3-GCC-5.4.0-2.26
 8) OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1
 9) gompi/2016b
10) FFTW/3.3.4-gompi-2016b
11) ScaLAPACK/2.0.2-gompi-2016b-OpenBLAS-0.2.18-LAPACK-3.6.1
12) foss/2016b

Now verify that the psm2 component has been built into OpenMPI:

# ompi_info | grep psm2
  MCA mtl: psm2 (MCA v2.0.0, API v2.0.0, Component v1.10.3)

MPI performance tuning

The Intel Omni-Path Fabric Performance Tuning User Guide discusses in chapter 5 MPI Performance.

  • Use the options in your mpirun command to specify the use of PSM2 with OpenMPI:

    mpirun -mca pml cm -mca mtl psm2 ...
    

OpenMPI tests

The Intel RPM mpitests_openmpi_gcc_hfi contains a number of MPI testing codes in the /usr/mpi/gcc/openmpi-1.10.4-hfi/tests subdirectories, for example:

  • intel/deviation - MPI bandwidth and latency deviations from Intel MPI Benchmarks (IMB).

  • osu_benchmarks-3.1.1/osu_bibw - Bidirectional Bandwidth Test from OSU_benchmarks.

Memory limits

Omni-Path network fabric by Cornelis Networks requires all user processes to have unlimited locked memory. For normal users starting a shell, this is configured in /etc/security/limits.conf by adding the lines:

* hard memlock unlimited
* soft memlock unlimited

This file is read by PAM when users log in. However, system daemons started during the boot process do not use /etc/security/limits.conf, and the correct memory limits must be set inside the daemon startup scripts. This is especially important for batch job services.

Users may verify the correct locked memory limits by the command:

# ulimit -l
unlimited

If the locked memory limit is too low, a rather strange error will be printed by the PSM2 library:

PSM2 can't open hfi unit: -1 (err=23)
PSM2 was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.
  Error: Failure in initializing endpoint
hfi_userinit: mmap of rcvhdrq at dabbad0004030000 failed: Resource temporarily unavailable

There will be system syslog messages as well like:

psm2-demo: (hfi/PSM)[4982]: PSM2 can't open hfi unit: -1 (err=23)
kernel: cache_from_obj: Wrong slab cache. kmalloc-64(382:step_batch) but object is from kmem_cache_node

In the libpsm2 source code the error originate from the function hfi_userinit() in the file libpsm2-10.*/opa/opa_proto.c.

Slurm configuration

MPI jobs and other tasks using the Omni-Path network fabric by Cornelis Networks fabric must have unlimited locked memory, see above. For slurmd running under systemd the limits are configured in /usr/lib/systemd/system/slurmd.service as:

LimitNOFILE=51200
LimitMEMLOCK=infinity
LimitSTACK=infinity

Limits defined in /etc/security/limits.conf or /etc/security/limits.d/\*.conf are not effective for systemd services, see https://access.redhat.com/solutions/1257953, so any limits must be defined in the service file, see man systemd.exec.

To ensure that job tasks running under Slurm have this configuration, verify the slurmd daemon’s limits by:

# grep locked /proc/$(pgrep -u 0 slurmd)/limits
Max locked memory         unlimited            unlimited            bytes

Also, the slurm.conf file must have this configuration:

PropagateResourceLimitsExcept=MEMLOCK

as explained in https://slurm.schedmd.com/faq.html#memlock.

The memory limit error with Omni-Path network fabric by Cornelis Networks was discussed in Slurm bug 3363.

IPoIB configuration

The role of IPoIB is to provide an IP network emulation layer on top of InfiniBand RDMA networks, see Understanding_InfiniBand_and_RDMA_technologies.

DNS hostnames: For the IPoIB convention, append -opa to the hostname.

To configure IPoIB on CentOS/RHEL 7 see:

For convenience we provide a script which will help you configure OmniPath and/or Mellanox Infiniband adapters on a CentOS/RHEL 7 system:

WARNING: You cannot use the instructions below if you have also installed the Mellanox_OFED distribution, since Mellanox_OFED replaces many RHEL/CentOS system utilities. However, the ibstat command is still the best way to display adapter information.

You must also configure /etc/rdma/rdma.conf as shown in Configuring_the_Base_RDMA_Subsystem. Suggested parameters (including NFS service) are:

IPOIB_LOAD=yes
SRP_LOAD=no
SRPT_LOAD=no
ISER_LOAD=no
ISERT_LOAD=no
RDS_LOAD=no
XPRTRDMA_LOAD=yes
SVCRDMA_LOAD=yes
FIXUP_MTRR_REGS=no
ARPTABLE_TUNING=yes

IPoIB devices

The IPoIB network devices must be configured carefully by hand, since this is not done automatically, and there are no standard device names.

If you use only a single Infiniband adapter and network interface, it will probably be named ib0, and you do not necessarily have to perform any device configuration. Then go to the IPoIB network configuration below.

However, if you have:

  • Multiple Infiniband and/or OmniPath adapters,

  • Multiple ports per adapter,

  • You want to control the device names in stead of the default ib0, ib1 etc.,

then you must configure the Infiniband devices carefully:

  • Install the prequisite RPM:

    yum install infiniband-diags
    

    and then discover the link/infiniband hardware addresses by:

    ibstat
    

    You can also get the link/infiniband hardware address of all network interfaces by:

    ip link show
    
  • Select device names for the IPoIB devices, since there doesn’t seem to be any naming standard for these (for Ethernet there is a Consistent_Network_Device_Naming standard).

    The ibstat command lists adapter names:

    mlx4_0, mlx4_1 etc. for Mellanox adapters no. 0 and 1.
    hfi1_0, hfi1_1 etc. for Intel OmniPath adapters no. 0 and 1.
    

    The adapter ports may either be configured for Infiniband or for Ethernet, so it may be reasonable to name the IPoIB ports as XXXib0, XXXib1 etc., since there may be several adapters. The kernel’s internal device names ib0 etc. should not be reused in a manual configuration.

    Suggested interface names might be concatenating adapter and port names like:

    mlx4_0ib0
    mlx4_0ib1
    hfi1_0ib0
    
  • Edit the udev file /etc/udev/rules.d/70-persistent-ipoib.rules as explained in Usage_of_70-persistent-ipoib using the last 8 bytes of each link/infiniband hardware address. An example file may be:

    ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="32", ATTR{address}=="?*70:10:6f:ff:ff:a0:74:71", NAME="mlx4_0ib0"
    ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="32", ATTR{address}=="?*70:10:6f:ff:ff:a0:74:72", NAME="mlx4_0ib1"
    ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="32", ATTR{address}=="?*00:11:75:01:01:7a:ff:df", NAME="hfi1_0ib0"
    

    It is perfectly possible for OmniPath adapters to coexist with Mellanox adapters in this way.

  • You can force the IPoIB interfaces to be renamed without performing a reboot by removing the ib_ipoib kernel module and then reloading it as follows:

    rmmod ib_ipoib
    modprobe ib_ipoib
    

IPoIB network configuration

When you have set up the Infiniband devices, the next step is to configure the IPoIB interfaces:

  • See Configure_IPoIB_Using_the_command_line about creating ifcfg files in /etc/sysconfig/network-scripts/. Notice these points:

    • The DEVICE field must match the custom name created in any udev renaming rules.

    • The NAME entry need not match the device name. If the GUI connection editor is started, the NAME field is what is used to present a name for this connection to the user.

    • The TYPE field must be InfiniBand in order for InfiniBand options to be processed properly.

    • CONNECTED_MODE is either yes or no, where yes will use connected mode and no will use datagram mode for communications, see see https://www.kernel.org/doc/Documentation/infiniband/ipoib.txt. The value yes should be used for performance reasons.

  • An example ifcfg file ifcfg-OmniPath would be:

    NM_CONTROLLED=no
    CONNECTED_MODE=yes
    TYPE=InfiniBand
    BOOTPROTO=none
    IPADDR=10.4.128.107
    PREFIX=16
    DEFROUTE=no
    IPV4_FAILURE_FATAL=yes
    IPV6INIT=no
    NAME=OmniPath
    DEVICE=hfi1_0ib0
    ONBOOT=yes
    MTU=65520
    

    On EL8 we have used this configuration successfully:

    DEVICE=ib0
    TYPE='InfiniBand'
    BOOTPROTO=static
    IPADDR=10.4.135.3
    ONBOOT=yes
    CONNECTED_MODE=yes
    DEFROUTE=no
    IPV6INIT=no
    MTU=65520
    NETMASK=255.255.0.0
    PREFIX=16
    BROADCAST=10.4.255.255
    NETWORK=10.4.0.0
    

With the above configurations in place you can restart the network service:

systemctl restart network

and display all network interfaces:

ifconfig -a

where the OmniPath and/or Infiniband interfaces should now be shown.

Monitoring IPoIB interfaces

Install these RPMs:

yum install libibverbs-utils infiniband-diags

Then you can list available Infiniband-like devices:

ibv_devices
ibv_devinfo

and see the device status:

ibstat

To display the OPA device ib0 IP address information on a list of nodes:

clush -w <node-list> '/sbin/ip -4 -o addr show label ib0' | sort

Performance tuning

Download the manual Intel® Omni-Path Performance Tuning User Guide. See Chapter 2.0 BIOS Settings about recommended settings, they include:

  • CPU power and performance policy = Performance or Balanced performance.

  • Enhanced Intel SpeedStep Technology = Enabled.

  • Intel Turbo Boost Technology = Enabled.

  • Intel VT for Directed I/O (VT-d) = Disabled.

  • CPU C-State = Enabled.

  • Processor C3 = Disabled.

  • Processor C6 = Enabled.

  • IOU Non-posted Prefetch = Disabled (where available).

  • Cluster-on-Die = Disabled.

  • Early Snoop = Disabled.

  • Home Snoop = Enabled.

  • NUMA Optimized = Enabled.

  • MaxPayloadSize = Auto or 256B.