Slurm installation and upgrading

Jump to our Slurm top-level page.

To get started with Slurm see the Slurm_Quick_Start Administrator Guide. See also CECI Slurm Quick Start Tutorial.

Hardware optimization for the slurmctld master server

SchedMD recommends that the slurmctld server should have only a few, but very fast CPU cores, in order to ensure the best responsiveness.

The file system for /var/spool/slurmctld/ should be mounted on the fastest possible disks (SSD or NVMe if possible).

Create global user accounts

There must be a uniform user and group name space (including UIDs and GIDs) across the cluster. It is not necessary to permit user logins to the control hosts (ControlMachine or BackupController), but the users and groups must be configured on those hosts. To restrict user login by SSH, use the AllowUsers parameter in /etc/ssh/sshd_config.

Slurm and MUNGE require consistent UID and GID across all servers and nodes in the cluster. Create the users/groups for slurm and munge, for example:

export MUNGEUSER=981
groupadd -g $MUNGEUSER munge
useradd  -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge  -s /sbin/nologin munge
export SlurmUSER=982
groupadd -g $SlurmUSER slurm
useradd  -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm  -s /bin/bash slurm

and make sure that these same users are created identically on all nodes. This must be done prior to installing RPMs (which would create random UID/GID pairs if these users don't exist).

MUNGE authentication service

The MUNGE authentication plugins identifies the user originating a message. You should read the Munge_installation guide and the Munge_wiki.

The MUNGE RPM for RHEL7 is in the EPEL repository, where you install the newest version of epel-release RPM for EL7, for example:

CentOS: yum install epel-release
RHEL7:  yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Install the MUNGE RPM packages from the EPEL repository:

yum install munge munge-libs munge-devel

To download packages directly (using statically in compute nodes):

MUNGE configuration and testing

By default MUNGE uses an AES AES-128 cipher and SHA-256 HMAC (Hash-based Message Authentication Code). Display these encryption options by:

munge -C
munge -M

On the Head/Master node (only) create a secret key to be used globally on every node (see the Munge_installation guide):

dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key

Alternatively use this command (slow):

/usr/sbin/create-munge-key -r

NOTE: For a discussion of using /dev/random in stead of /dev/urandom (pseudo-random) as recommended in the Munge_installation guide, see Myths about /dev/urandom.

Securely propagate /etc/munge/munge.key (e.g., via SSH) to all other hosts within the same security realm:

scp -p /etc/munge/munge.key hostXXX:/etc/munge/munge.key

Make sure to set the correct ownership and mode on all nodes:

chown -R munge: /etc/munge/ /var/log/munge/
chmod 0700 /etc/munge/ /var/log/munge/

Then enable and start the MUNGE service on all nodes:

systemctl enable munge
systemctl start  munge

Run some tests as described in the Munge_installation guide:

munge -n
munge -n | unmunge          # Displays information about the MUNGE key
munge -n | ssh somehost unmunge
remunge

Build Slurm RPMs

Install Slurm prerequisites as well as several optional packages:

yum install rpm-build gcc openssl openssl-devel libssh2-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel man2html libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker

Important: Install the MariaDB (a replacement for MySQL) packages before you build Slurm RPMs (otherwise some libraries will be missing):

yum install mariadb-server mariadb-devel

Get the source code from the Slurm_download page.

Set the version (currently 17.11.7-1) and build Slurm RPMs by:

export VER=17.11.7-1
rpmbuild -ta slurm-$VER.tar.bz2

The RPM packages will typically be in $HOME/rpmbuild/RPMS/x86_64/ and should be copied to all other nodes.

Installing RPMs

The RPMs to be installed on the head node, compute nodes, and slurmdbd node can vary by configuration, but here is a suggested starting point:

  • Head/Master Node (where the slurmctld daemon runs), Compute and Login nodes:

    export VER=17.02.10       # If you have Slurm version 17.02
    yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-munge-$VER*rpm slurm-perlapi-$VER*rpm slurm-plugins-$VER*rpm slurm-torque-$VER*rpm

    With Slurm 17.11 the RPMs have been restructured, so omit the slurm-munge and slurm-plugins but add the new slurm-example-configs:

    export VER=17.11.7
    yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-perlapi-$VER*rpm slurm-torque-$VER*rpm slurm-example-configs-$VER*rpm

    With Slurm 17.11 you have to explicitly enable the service:

    systemctl enable slurmctld

    The slurm-torque package could perhaps be omitted, but it does contain a useful /usr/bin/mpiexec wrapper script.

    If the database service will run on the Head/Master node, install some additional RPMs:

    export VER=17.02.10
    yum install slurm-slurmdbd-$VER*rpm slurm-sql-$VER*rpm slurm-plugins-$VER*rpm

    With Slurm 17.11:

    export VER=17.11.7
    yum install slurm-slurmdbd-$VER*rpm

    With Slurm 17.11 you have to explicitly enable the service:

    systemctl enable slurmdbd
  • On Compute nodes you may additionally install the slurm-pam_slurm RPM package to prevent rogue users from logging in:

    yum install slurm-pam_slurm-$VER*rpm

    With Slurm 17.11 you may consider this RPM as well with special PMIx libraries:

    yum install slurm-libpmi-$VER*rpm

    With Slurm 17.11 you have to explicitly enable the service:

    systemctl enable slurmd
  • Database-only (slurmdbd service) Node:

    export VER=17.02.10
    yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-munge-$VER*rpm slurm-plugins-$VER*rpm slurm-slurmdbd-$VER*rpm slurm-sql-$VER*rpm

    With Slurm 17.11:

    export VER=17.11.7
    yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-slurmdbd-$VER*rpm

    With Slurm 17.11 you have to explicitly enable the service:

    systemctl enable slurmdbd

Study the configuration information in the Installation Guide.

Upgrading Slurm

New Slurm updates are released rather often. Follow the Upgrades instructions in the Slurm_Quick_Start page. Pay attention to these statements:

  • You may upgrade at most by 2 major versions, see the Upgrades page:

    • Slurm daemons will support RPCs and state files from the two previous minor releases (e.g. a version 16.05.x SlurmDBD will support slurmctld daemons and commands with a version of 16.05.x, 15.08.x or 14.11.x).

    The word minor should read major here.

  • In other words, when changing the version to a higher release number (e.g from 14.11.x to 15.08.x) always upgrade the slurmdbd daemon first.

  • Be mindful of your configured SlurmdTimeout and SlurmctldTimeout values.

  • The recommended upgrade order is as follows: ...

If you use a database, also make sure to:

  • Make a database dump (see Slurm_database) prior to the slurmdbd upgrade.

  • Start the slurmdbd service manually after the upgrade in order to avoid timeouts (reference needed). In stead of starting the slurmdbd service, it may be necessary to start the daemon manually. If you use the systemctl command, it is very likely to exceed a system time limit and kill slurmdbd before the database conversion has been completed.

    The recommended way to perform the slurmdbd database upgrade is therefore:

    time slurmdbd -D -vvv

    See further info below.

This command can report current jobs that have been orphaned on the local cluster and are now runaway:

sacctmgr show runawayjobs

Upgrade of MySQL/MariaDB

If you restore a database dump (see Slurm_database) onto a different server running a newer MySQL/MariaDB version, for example upgrading MySQL 5.1 on CentOS 6 to MariaDB 5.5 on CentOS 7, there are some extra steps.

See Upgrading from MySQL to MariaDB about running the mysql_upgrade command:

mysql_upgrade

whenever major (or even minor) version upgrades are made, or when migrating from MySQL to MariaDB.

It may be necessary to restart the mysqld service or reboot the server after this upgrade (??).

Make a dry run database upgrade

Optional: You can test the database upgrade procedure before doing the real upgrade.

In order to verify and time the slurmdbd database upgrade you may make a dry_run upgrade for testing before actual deployment.

Here is a suggested procedure:

  1. Drain a compute node running the current Slurm version and use it for testing the database.

    The following actions must be performed on the drained compute node.

    First stop the regular slurmd daemons on the compute node:

    systemctl stop slurmd
  2. Install the OLD (the cluster's current version, say, NN.NN) additional slurmdbd database RPMs as described above:

    VER=NN.NN
    yum install slurm-slurmdbd-$VER*rpm slurm-sql-$VER*rpm

    Information about building RPMs is in the Slurm_installation page. Note: From Slurm 17.11 the slurm-sql RPM no longer exists.

  3. Install the database RPM packages and configure the database EXACTLY as described in the Slurm_database page.
  4. Configure the MySQL/MariaDB database as described in the Slurm_database page.

    Copy the configuration files from the main server to the compute node:

    /etc/slurm/slurm.conf
    /etc/slurm/slurmdbd.conf

    Important: Edit these files to replace the main server name by localhost so that all further actions take place on the compute node, not the main server!

    Configure this in slurmdbd.conf:

    DbdHost=localhost
    StorageHost=localhost

    Configure this in slurm.conf:

    AccountingStorageHost=localhost
  5. Copy the latest database dump file (/root/mysql_dump, see Slurm_database) from the main server to the compute node. Load the dump file into the testing database:

    mysql -u root -p < /root/mysql_dump

    Verify the database contents on the compute node by making a new database dump and compare it to the original dump.

  6. Make sure that slurmdbd is running, and start it if necessary:

    systemctl status slurmdbd
    systemctl start slurmdbd

    Make some query to test slurmdbd:

    sacctmgr show user -s

    If all is well, stop the slurmdbd before the upgrade below:

    systemctl stop slurmdbd
  7. Update all Slurm RPMs to the new version (say, 17.11.7) built as shown above:

    export VER=17.11.7
    yum update slurm*$VER*.rpm
  8. Perform and time the actual database upgrade:

    time slurmdbd -D -vvv

    and wait for the output:

    slurmdbd: debug2: Everything rolled up

    and do a Control-C.

    Write down the timing information from the time command, since this will be the expected approximate time when you later perform the real upgrade.

    Now start the service as usual:

    systemctl start slurmdbd
  9. Make some query to test slurmdbd:

    sacctmgr show user -s

    and make some other tests to verify that slurmdbd is responding correctly.

  10. When all tests have been completed successfully, reinstall the compute node to its default installation.

Upgrading on CentOS 7

Let's assume that you have built the updated RPM packages for CentOS 7 and copied them to the current directory so you can use yum on the files directly.

Prerequisites before upgrading

If you have installed the pdsh tool, there may be a module that has been linked against a specific library version libslurm.so.30, and yum will then refuse to update the slurm* RPMs. You must first do:

yum remove pdsh-mod-slurm

and then later rebuild and install pdsh-mod-slurm, see the SLURM page.

Upgrade slurmdbd

The upgrading steps for the slurmdbd host are:

  1. Stop the slurmdbd service:

    systemctl stop slurmdbd
  2. Make a mysqldump of the MySQL/Mariadb database (see above).
  3. For Slurm 17.02 upgrade the database related as well as munge RPMs (version $VER here):

    export VER=17.02.10
    yum update slurm-slurmdbd-$VER-*.rpm slurm-sql-$VER*.rpm slurm-munge-$VER*.rpm

    For Slurm 17.11 the RPM packages have been restructured (the slurm-sql and slurm-munge have disappeared), and you need to update all RPMs:

    export VER=17.11.7
    yum update slurm*$VER*.rpm
  4. Start the slurmdbd daemon manually:

    time slurmdbd -D -vvv

    The completion of the database conversion may be printed as:

    slurmdbd: debug2: Everything rolled up

    Then do a Control-C.

  5. Restart the slurmdbd service normally:

    systemctl start slurmdbd
Upgrade slurmctld

The upgrading steps for the slurmctld host are:

  1. Change the timeout values in slurm.conf to:

    SlurmctldTimeout=3600
    SlurmdTimeout=3600

    and copy /etc/slurm/slurm.conf to all nodes. Then reconfigure the running daemons and test the timeout values:

    scontrol reconfigure
    scontrol show config | grep Timeout
  2. Stop the slurmctld service:

    systemctl stop slurmctld
  3. Make a backup copy of the StateSaveLocation /var/spool/slurmctld directory:

    tar czvf $HOME/var.spool.slurmctld.tar.gz /var/spool/slurmctld
  4. Upgrade the RPMs:

    export VER=17.11.7
    yum update slurm*$VER-*.rpm

For Slurm 17.11 there is a new RPM for slurmctld:

yum install slurm-slurmctld-$VER-*.rpm
  1. Enable and restart the slurmctld service:

    systemctl enable slurmctld
    systemctl restart slurmctld
Install slurm-libpmi

In Slurm 17.11 the libpmi.so libraries have been moved to a new RPM. If you need it, install this one:

yum install slurm-libpmi-17.11.7*.rpm
Upgrade MPI applications

MPI applications such as OpenMPI may be linked against the /usr/lib64/libslurm.so library. In this context you must understand the remark in the Upgrades page:

The libslurm.so version is increased every major release.
So things like MPI libraries with Slurm integration should be recompiled.
Sometimes it works to just symlink the old .so name(s) to the new one, but this has no guarantee of working.

In the thread Need for recompiling openmpi built with --with-pmi? it has been found that:

It looks like it is the presence of lib64/libpmi2.la and lib64/libpmi.la that is the "culprit". They are installed by the slurm-devel RPM.
Openmpi uses GNU libtool for linking, which finds these files, and follow their "dependency_libs" specification, thus linking directly to libslurm.so.

Slurm version 16.05 and later no longer installs the libpmi*.la files. This should mean that if your OpenMPI was built against Slurm 16.05 or later, there should be no problem (we think), but otherwise you probably must rebuild your MPI applications and install them again at the same time that you upgrade the slurmd on the compute nodes.

To check for the presence of the "bad" files, go to your software build host and search:

locate libpmi2.la
locate libpmi.la

TODO: Find a way to read relevant MPI libraries like this example:

readelf -d libmca_common_pmi.so
Upgrade slurmd on nodes

First determine which Slurm version the nodes are running:

clush -bg <partition> slurmd -V         # Using ClusterShell
pdsh -g <partition> slurmd -V | dshbak  # Using PDSH

See the SLURM page about ClusterShell or PDSH.

For the compute nodes running slurmd the procedure could be:

  1. Drain all desired compute nodes in a <nodelist>:

    scontrol update NodeName=<nodelist> State=draining Reason="Upgrading slurmd"

    Nodes will change from the DRAINING to the DRAINED state as the jobs are completed. Check which nodes have become DRAINED:

    sinfo -t drained
  2. Stop the slurmd daemons on compute nodes:

    clush -bw <nodelist> systemctl stop slurmd
  3. Update the RPMs (here: version 17.11.7) on nodes:

    clush -bw <nodelist> 'yum -y update /some/path/slurm*17.11.7-*.rpm'

    and make sure to install also the new slurmd and contribs packages:

    clush -bw <nodelist> 'yum -y install /some/path/slurm-slurmd*17.11.7-*.rpm /some/path/slurm-contribs*17.11.7-*.rpm'

    From 17.02 and later slurm-contribs replaces the obsolete packages slurm-seff, slurm-sjobexit, slurm-sjstat.

    Important: From Slurm 17.11 you must explicitly enable the service:

    clush -bw <nodelist> systemctl enable slurmd
  4. For restarting slurmd there are two alternatives:

    1. Restart slurmd or simply reboot the nodes in the DRAINED state:

      clush -bw <nodelist> systemctl daemon-reload
      clush -bw <nodelist> systemctl restart slurmd
        or simply reboot:
      clush -bw <nodelist> shutdown -r now
    2. Reboot the nodes automatically as they become idle using the RebootProgram as configured in slurm.conf, see the scontrol reboot option and explanation in the man-page:

      scontrol reboot [ASAP] [NodeList]

      The ASAP flag is available from Slurm 17.02 (see man scontrol for older versions).

  5. Return upgraded nodes to the IDLE state:

    scontrol update NodeName=<nodelist> State=resume

Finally, restore the timeout values in slurm.conf to their defaults:

SlurmctldTimeout=300
SlurmdTimeout=300

and copy /etc/slurm/slurm.conf to all nodes. Then reconfigure the running daemons:

scontrol reconfigure

Again, consult the Upgrades page before you start!

Removing /etc/init.d/slurm

Obsolete: This section only applies to Slurm 16.05 and older.

On Systemd systems such as RHEL7/CentOS7 the old-style init-script /etc/init.d/slurm should be disabled, see the Slurm_configuration page. This should be done at the initial installation as well as after upgrading. With Slurm 17.02 and newer this bug_3371 has been resolved.

The relevant commands are summarized as:

chkconfig --del slurm
rm -f /etc/init.d/slurm

Then enable and start services using systemctl.

Log file rotation

The Slurm log files may be stored in /var/log/slurm, and they may grow rapidly on a busy system. Especially the slurmctld.log file on the controller machine may grow very big.

Therefore you probably want to configure logrotate to administer your log files. On RHEL and CentOS the logrotate configuration files are in the /etc/logrotate.d/ directory.

Manual configuration is required because the SchedMD RPM files do not contain the logrotate setup, see bug_3904 and bug_2215 (example logrotate scripts).

First install the relevant RPM:

yum install logrotate

The following script /etc/logrotate.d/slurmctld will rotate and compress the slurmctld log file on a weekly basis:

/var/log/slurm/slurmctld.log {
weekly
missingok
notifempty
sharedscripts
create 0600 slurm slurm
rotate 8
compress
postrotate
      /bin/systemctl reload slurmctld.service > /dev/null 2>/dev/null || true
endscript
}

The slurmctld daemon (service) is reloaded in the process, see the recommendation in bug_3402. The reload action defined in /etc/systemd/system/slurmctld.service sends a HUP signal to the service.

For Init based Linuxes (such as CentOS 6) replace the reload command by:

service slurmctld reload

This configuration may be copied for the slurmd and slurmdbd daemons by changing the slurmctld string in the above.

Warning: Do not run scontrol reconfig or restart slurmctld to rotate the log files, since this will incur a huge overhead.

This setup has been proposed in bug_3904, and there is also in example as an attachment in bug_2215. Furthermore, an example is at the very end of the slurm.conf man-page (not a logical location).

Notice: From Slurm 17.11 there is a new signal SIGHUP2 whose sole function is to close and then reopen the log file, for example:

systemctl kill -s SIGUSR2 slurmctld.service

See the RELEASE_NOTES about SIGUSR2.

Niflheim: Slurm_installation (last edited 2018-05-31 08:51:25 by OleHolmNielsen)