.. _Lenovo_SD665_V3: ======================== Lenovo SD665_V3 server ======================== .. Contents:: This page contains information about Lenovo SD665_V3_ servers deployed in our cluster. The Lenovo ThinkSystem SD665_V3_ is a 2-socket ½U server that features the AMD EPYC 9004 "Genoa" family of processors. The nodes are housed in the upgraded ThinkSystem DW612S_ enclosure. .. _SD665_V3: https://lenovopress.lenovo.com/lp1612-lenovo-thinksystem-sd665-v3-server .. _DW612S: https://pubs.lenovo.com/dw612s_neptune_enclosure/ NVIDIA InfiniBand Adapter (SharedIO) ======================================= The SD665_V3_ has a water-cooled NVIDIA 2-Port PCIe Gen5 x16 InfiniBand Adapter (SharedIO) `ThinkSystem NVIDIA ConnectX-7 NDR200 InfiniBand QSFP112 Adapters <https://lenovopress.lenovo.com/lp1693-thinksystem-nvidia-connectx-7-ndr200-infiniband-qsfp112-adapters>`_. The adapter is located in the right-hand SD665_V3_ node and connects both servers in the tray. There is important information regarding SharedIO for older SD650 servers in the article `Considerations when using ThinkSystem SD650, SD650 V2, SD650 V3 and ConnectX-6 HDR, ConnectX-7 NDR SharedIO <https://support.lenovo.com/us/en/solutions/ht510888-thinksystem-sd650-and-connectx-6-hdr-sharedio-lenovo-servers-and-storage>`_. The issues have apparently been resolved in the SD665_V3_ system. Please note that several Infiniband tools such as ``ibnetdiscover`` fail with an error message when executed on the SD665_V3_ "auxiliary" (left-hand) node, and you **must** execute such tools on the "primary" (right-hand) node (private communication with a Lenovo support person). Documentation and software ========================== Lenovo provides SD665_V3_ information and downloads: * https://pubs.lenovo.com/sd665-v3/ There is a `Product Home <https://datacentersupport.lenovo.com/us/en/products/servers/thinksystem/sd665-v3/7d9p>`_ page for downloads. The `EasyBuild` software module `OpenMPI` seems to have issues with the Mellanox libraries. Setting these variables may be a workaround:: export OMPI_MCA_btl='^openib,ofi' export OMPI_MCA_mtl='^ofi' Booting and BIOS configuration ============================== See the :ref:`Lenovo_BIOS_settings` page. See the :ref:`Lenovo_XClarity_BMC` page. There is a document `Lenovo ThinkSystem SR645 Recommended UEFI and OS settings for Lenovo Scalable Infrastructure (LeSI) <https://download.lenovo.com/servers/sr645_and_sr665_uefi_and_os_settings_v1.4.txt>`_ which recommends: * For best performance set to **Maximum Performance** first, then set to **Custom Mode** OFED software and drivers ------------------------- The OpenFabrics Enterprise Distribution (OFED_) is open-source software for RDMA and kernel bypass applications, as provided by the `OpenFabrics Alliance <http://en.wikipedia.org/wiki/OFED>`_. Mellanox provides some information about Inbox_drivers_ from various OS vendors, but it is not stated whether they can be used in place of the drivers from Mellanox described below. Nvidia's `Red Hat Enterprise Linux (RHEL) Inbox Driver documentation <https://docs.nvidia.com/networking/display/rhel89/general+support>`_ has the statement:: Warning ConnectX-7 is only supported as technical preview (i.e., the feature is not fully supported for production). Since the SD665_V3_ nodes have ``ConnectX-7`` adapters, these are **NOT SUPPORTED** at present! .. _Inbox_drivers: https://network.nvidia.com/products/adapter-software/ethernet/inbox-drivers/ Install these prerequisite packages:: dnf -y install libibverbs rdma libmlx4 libibverbs-utils infiniband-diags librdmacm librdmacm-utils ibacm dnf -y install tk gcc-gfortran kernel-modules-extra For the Mellanox Infiniband adapters it is recommended to download the .tar.gz file from `Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) <https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/>`_. Unpack the tar-ball and run the installer, for example:: tar xzf MLNX_OFED_LINUX-24.01-0.3.3.1-rhel8.9-x86_64.tgz cd MLNX_OFED_LINUX-24.01-0.3.3.1-rhel8.9-x86_64 ./mlnxofedinstall The installer script has some options:: ./mlnxofedinstall --help ./mlnxofedinstall -q # Set quiet - no messages will be printed yes | ./mlnxofedinstall # Answer yes to all questions The installer attempts to make firmware updates, but we may experience this warning:: Attempting to perform Firmware update... The firmware for this device is not distributed inside Mellanox driver: 42:00.0 (PSID: LNV0000000049) To obtain firmware for this device, please contact your HW vendor. Failed to update Firmware. so it may be a good idea to add this flag and omit firmware updates:: ./mlnxofedinstall --without-fw-update Installation instructions are in the *User Manual* from the `Mellanox documentation <https://docs.nvidia.com/networking/software/adapter-software/index.html#linux>`_. Verify that the Mellanox driver RPMs have been installed and the ``openibd`` service started:: rpm -qa | grep mlnx systemctl status openibd If your kernel version does not match with any of the offered pre-built RPMs, you can add your kernel version by using the ``mlnx_add_kernel_support.sh`` script located inside the MLNX_OFED package. **Notices**: * On Redhat and SLES distributions with errata kernel installed there is no need to use the ``mlnx_add_kernel_support.sh`` script. The regular installation can be performed and weak-updates mechanism will create symbolic links to the MLNX_OFED kernel modules. * OFED_ software includes kernel modules for the running kernel, and these must be rebuilt if the kernel is upgraded! .. _OFED: https://www.openfabrics.org/index.php/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html