Slurm database

This page describes Slurm database configuration on RHEL 7 or 8 (and clones) servers.

Jump to our top-level Slurm page: Slurm batch queueing system

Database documentation

See the accounting page and the Slurm_tutorials with Slurm Database Usage. Please note this statement:

MySQL or MariaDB is the preferred database.
To enable this database support one only needs to have the development package for the database they wish to use on the system.
Slurm uses the InnoDB storage engine in MySQL to make rollback possible.
This must be available on your MySQL installation or rollback will not work.

The following configuration is relevant only for the Database node (which may be the Head/Master node), but not the compute nodes.

Hardware optimization

SchedMD recommends to have a separate database server, if possible. It may be on the same server as slurmctld, but this may impact performance.

Furthermore, running the database on a separate server gives you a strong advantage because you can upgrade slurmdbd without affecting slurmctld and thereby the cluster operations!

You should consider optimizing the database performance by mounting the MariaDB (MySQL) database directory on a dedicated high-speed file system:

/var/lib/mysql

Whether this is required depends on the number and frequency of jobs expected. A high-speed file system could be placed on a separate SSD SAS/SATA disk drive, or even better on a PCIe NVMe disk drive.

Such disks must be qualified for high-volume random small read/write operations relevant for databases, and should be built with the Non-Volatile Memory Express (NVMe) storage interface standard for reliability and performance. It seems that NVMe support was added to Linux kernel 3.3 (and later), so it should work with RHEL7/CentOS7. A new scalable block layer for high-performance SSD storage was added in kernel 3.13.

A disk size of 200 GB or 400 GB should be sufficient. Consider installing 2 disk drives and run them in a RAID-1 mirrored configuration. Example hardware could be the Intel SSD P3700 series or the Kingston E1000 series.

Install slurmdbd package

Install the slurm database RPM on the database-only (slurmdbd service) node:

export VER=22.05.8-1  # Use the latest version
yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-slurmdbd-$VER*rpm

Explicitly enable the service:

systemctl enable slurmdbd

Set up MariaDB database

The accounting page has a section named MySQL Configuration which should be studied first. Please note:

  • CentOS7/RHEL7 has replaced MySQL by the MariaDB database version 5.5.

  • CentOS8/RHEL8 contains the MariaDB database version 10.3.

Make sure the MariaDB packages were installed before you built the Slurm RPMs:

rpm -q mariadb-server mariadb-devel
rpm -ql slurm-slurmdbd | grep accounting_storage_mysql.so     # Must show location of this file

Otherwise you must install MariaDB packages:

yum install mariadb-server mariadb-devel

and rebuild all RPMs with mysql support as shown in Slurm installation and upgrading:

rpmbuild -ta slurm-$VER.tar.bz2 --with mysql

If you will use Ansible to manage the database, Ansible needs this Python package:

yum install MySQL-python    # CentOS7/RHEL7
dnf install python3-mysql   # CentOS8/RHEL8

Now start the MariaDB service:

systemctl start mariadb
systemctl enable mariadb
systemctl status mariadb

Make sure to configure the MariaDB database’s root password as instructed at first invocation of the mariadb service, or run this command:

/usr/bin/mysql_secure_installation

Select a suitable slurm user’s database password. Now follow the accounting page instructions (using -p to enter the database password):

# mysql -p
grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'some_pass' with grant option;  ### WARNING: change the some_pass
SHOW GRANTS;
SHOW VARIABLES LIKE 'have_innodb';
create database slurm_acct_db;
quit;

WARNING: Use the slurm database user’s password in stead of some_pass.

Optional: If you would like to grant read-only (SELECT) access to the database, set up a readonly user with access from % (meaning any host):

# mysql -p
grant select on slurm_acct_db.* TO 'readonly'@'%' identified by 'some_pass';

For remote hosts you of course have to open the database server’s firewall on port 3306 as described in Firewall between slurmctld and slurmdbd.

You can verify the database grants for the slurm user:

# mysql -p -u slurm
show grants;
quit;

Regarding InnoDB, by default, MariaDB uses the XtraDB storage engine, a performance enhanced fork of the InnoDB storage engine.

This will grant user ‘slurm’ access to do what it needs to do on the local host or the storage host system. This must be done before the slurmdbd will work properly. After you grant permission to the user ‘slurm’ in mysql then you can start slurmdbd and the other Slurm daemons. You start slurmdbd by typing its pathname ‘/usr/sbin/slurmdbd’ or ‘/etc/init.d/slurmdbd start’. You can verify that slurmdbd is running by typing ps aux | grep slurmdbd.

If the slurmdbd is not running you can use the -v option when you start slurmdbd to get more detailed information. Starting the slurmdbd in daemon mode with the -D -vvv option can also help in debugging so you don’t have to go to the log to find the problem.

MySQL configuration

In the accounting page section Slurm Accounting Configuration Before Build some advice about MySQL configuration is given:

  • NOTE: Before running the slurmdbd for the first time, review the current setting for MySQL’s innodb_buffer_pool_size. Consider setting this value large enough to handle the size of the database. This helps when converting large tables over to the new database schema and when purging old records. Setting innodb_lock_wait_timeout and innodb_log_file_size to larger values than the default is also recommended.

The following is recommended for /etc/my.cnf, but on CentOS 7 you should create a new file /etc/my.cnf.d/innodb.cnf containing:

[mysqld]
innodb_buffer_pool_size=1024M
innodb_log_file_size=64M
innodb_lock_wait_timeout=900

The innodb_buffer_pool_size might be even larger, like 50%-80% of the server’s RAM size.

To implement this change you have to shut down the database and move/remove logfiles:

systemctl stop mariadb
mv /var/lib/mysql/ib_logfile? /tmp/
systemctl start mariadb

You can check the current setting in MySQL like so:

# mysql -p
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
SHOW VARIABLES LIKE 'innodb_log_file_size';
SHOW VARIABLES LIKE 'innodb_lock_wait_timeout';
quit;

See also Bug_2457:

  • The innodb_buffer_pool_size can have a huge impact - we’d recommend setting this as high as half the RAM available on the slurmdbd server.

Slurm database tables

To view the status of the tables in the slurm_acct_db database:

# mysqlshow -p --status slurm_acct_db

It is possible to display the contents of the slurm_acct_db database:

# mysql -p -u slurm slurm_acct_db
Enter password:

To show tables in the database:

MariaDB [slurm_acct_db]> show tables;
+----------------------------------+
| Tables_in_slurm_acct_db |
+----------------------------------+
| acct_coord_table |
| acct_table |
| clus_res_table |
| cluster_table |
| convert_version_table |
| federation_table |
| niflheim_assoc_table |
| niflheim_assoc_usage_day_table |
| niflheim_assoc_usage_hour_table |
| niflheim_assoc_usage_month_table |
| niflheim_event_table |
| niflheim_job_table |
| niflheim_last_ran_table |
| niflheim_resv_table |
| niflheim_step_table |
| niflheim_suspend_table |
| niflheim_usage_day_table |
| niflheim_usage_hour_table |
| niflheim_usage_month_table |
| niflheim_wckey_table |
| niflheim_wckey_usage_day_table |
| niflheim_wckey_usage_hour_table |
| niflheim_wckey_usage_month_table |
| qos_table |
| res_table |
| table_defs_table |
| tres_table |
| txn_table |
| user_table |
+----------------------------------+

where niflheim refers to the name of our cluster.

The contents of a table can be described like in this example:

MariaDB [slurm_acct_db]> describe user_table;
+---------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| creation_time | bigint(20) unsigned | NO | | NULL | |
| mod_time | bigint(20) unsigned | NO | | 0 | |
| deleted | tinyint(4) | YES | | 0 | |
| name | tinytext | NO | PRI | NULL | |
| admin_level | smallint(6) | NO | | 1 | |
+---------------+---------------------+------+-----+---------+-------+

An element in the user_table can be printed:

MariaDB [slurm_acct_db]> select * from user_table where name="xxx";
+---------------+------------+---------+------+-------------+
| creation_time | mod_time | deleted | name | admin_level |
+---------------+------------+---------+------+-------------+
| 1477321893 | 1477321893 | 0 | xxx | 1 |
+---------------+------------+---------+------+-------------+

SlurmDBD Configuration

While the slurmdbd will work with a flat text file for recording job completions and such this configuration will not allow “associations” between a user and account. A database allows such a configuration.

MySQL or MariaDB is the preferred database. To enable this database support one only needs to have the development package for the database they wish to use on the system. Slurm uses the InnoDB storage engine in MySQL to make rollback possible. This must be available on your MySQL installation or rollback will not work.

slurmdbd requires its own configuration file called slurmdbd.conf. Start by copying the example file from the slurmdbd.conf man-page.

The file /etc/slurm/slurmdbd.conf should be only on the computer where slurmdbd executes and should only be readable by the user which executes slurmdbd (e.g. “slurm”). It must be protected from unauthorized access since it contains a database login name and password:: See the slurmdbd.conf man-page for a more complete description of the configuration parameters.

NOTICE: The /etc/slurm/slurm.conf file is not needed or used in the slurmdbd server. The only file required is /etc/slurm/slurmdbd.conf. However, user commands such as sinfo and sacctmgr will need access to slurm.conf, and if /etc/slurm/slurm.conf does not exist, then they will use the configless mode (if configured).

Set up files and permissions:

chown slurm: /etc/slurm/slurmdbd.conf
chmod 600 /etc/slurm/slurmdbd.conf
mkdir /var/log/slurm
touch /var/log/slurm/slurmdbd.log
chown slurm: /var/log/slurm/slurmdbd.log

Configure some of the slurmdbd.conf variables:

LogFile=/var/log/slurm/slurmdbd.log
DbdHost=XXXX    # Replace by the slurmdbd server hostname (for example, slurmdbd.my.domain)
DbdPort=6819    # The default value
SlurmUser=slurm
StorageHost=localhost
StoragePass=some_pass    # The above defined database password, change it for your site!
StorageLoc=slurm_acct_db

Setting database purge parameters

A database with very many job records (maybe of the order of a million) is causing widespread problems when upgrading Slurm and the database. See the mailing list thread [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3.

In order to solve this problem, it seems necessary to purge job records from the Slurm database. In slurmdbd.conf you may define a number of purge parameters such as:

PurgeEventAfter
PurgeJobAfter
PurgeResvAfter
PurgeStepAfter
PurgeUsageAfter

The values of these parameters depend on the number of jobs in the database, which differs a lot between sites. There does not seem to be any heuristics for determining good values, so some testing will be required.

From the high_throughput page: You might also consider setting the Purge options in your slurmdbd.conf to clear out old Data. A typical configuration might look like this:

PurgeEventAfter=12months
PurgeJobAfter=12months
PurgeResvAfter=2months
PurgeStepAfter=2months
PurgeSuspendAfter=1month
PurgeTXNAfter=12months
PurgeUsageAfter=12months

The purge operation is done at the start of each time interval (see bug_4295), which means on the 1st day of the month in this example. Monthly, daily or even hourly purge operations would occur when using different time units for the same interval:

PurgeStepAfter=2months
PurgeStepAfter=60days
PurgeStepAfter=1440hours

A monthly purge operation can be a huge amount of work for a database depending on its size, and you certainly want to cut down the amount of work required during the purges. If you did not use purges before, it is probably a good idea to try out a series of daily purges starting with:

PurgeEventAfter=2000days
PurgeJobAfter=2000days
PurgeResvAfter=2000days
PurgeStepAfter=2000days
PurgeSuspendAfter=2000days

If this works well over a few days, decrease the purge interval 2000days little by little and try again (1800, 1500, etc) until you after many iterations come down to the desired final purge intervals.

Logging of purge events can be configured in slurmdbd.conf using:

DebugLevel=verbose
DebugFlags=DB_ARCHIVE

slurmdbd hostname configuration

The slurmdbd hostname must be configured correctly. The default value may be localhost, meaning that no other hosts can inquire the slurmdbd service (you may or may not want this limitation).

We recommend to explicitly set the slurmdbd hostname (for example, slurmdbd.my.domain) in these files:

After restarting the slurmctld and slurmdbd services, verify the setup by:

scontrol show config | grep AccountingStorageHost

If other nodes than the slurmdbd node must be able to connect to the slurmdbd service, you must open the firewall to specific hosts as described in Firewall between slurmctld and slurmdbd.

Setting MaxQueryTimeRange

It may be a good idea to limit normal users from inquiring the database for too long periods of time. The slurmdbd.conf parameter is used for this, for example for a maximum of 60 days:

MaxQueryTimeRange=60-0

Start the slurmdbd service

First try to run slurmdbd manually to see the log:

slurmdbd -D -vvv

Terminate the process by Control-C when the testing is OK.

Start the slurmdbd service:

systemctl enable slurmdbd
systemctl start slurmdbd
systemctl status slurmdbd

If you get this error in /var/log/slurm/slurmdbd.log:

error: Couldn't find the specified plugin name for accounting_storage/mysql looking at all files

then the file /usr/lib64/slurm/accounting_storage_mysql.so is missing because you forgot to install the mariadb-devel RPM before building Slurm RPMs. You must install the mariadb-devel RPM and rebuild and reinstall Slurm RPMs as shown above.

Backup and restore of database

In order to backup the entire database to a different location (for disaster recovery or migration), the Slurm database must be backed up regularly.

You may want to read the page Mysqldump with Modern MySQL.

See also the discussion in Bug_10295 about dumping only the slurm_acct_db database, and using --single-transaction for InnoDB tables. You can show the tables by:

# mysqlshow -p --status slurm_acct_db

For compressing the (large) database dumps, install the gzip, bzip2, and perhaps the lbzip2 package:

yum install gzip bzip2
yum install lbzip2    # From EPEL

Backup script with crontab

Make a slurm_acct_db database using mysqldump, for example, using this example script /root/mysqlbackup (Note: Insert the correct root DATABASE-PASSWORD in PWD):

#!/bin/sh
# MySQL Backup Script for slurm_acct_db database
HOST=localhost
BACKUPFILE=/root/mysql_dump.gz
USER=root
PWD='DATABASE-PASSWORD'
DUMP_ARGS="--single-transaction"
DATABASES="-B slurm_acct_db"
/usr/bin/mysqldump --host=$HOST --user=$USER --password=$PWD $DUMP_ARGS $DATABASES | gzip > $BACKUPFILE

Write permission to $BACKUPFILE is required. The script is also available in https://github.com/OleHolmNielsen/Slurm_tools/blob/master/database/.

Make regular database dumps, for example by a crontab job:

# MySQL database backup
30 7 * * * /root/mysqlbackup

Backup script with logrotate

It is preferable to make daily database backup dumps and keep a number of backup versions. The logrotate tool is ideal for this purpose.

This logrotate file /etc/logrotate.d/slurm_acct_db_backup will make 8 backup versions in /var/log/mariadb/ (insert the correct root DATABASE-PASSWORD):

/var/log/mariadb/slurm_acct_db_backup.bz2 {
daily
dateext
dateyesterday
rotate 8
nocompress
create 640 root adm
postrotate
# Dump ONLY the Slurm database slurm_acct_db
# Strongly recommended: --single-transaction
# Use bzip2 for compression.
# Alternatively use lbzip2 from the EPEL repo: lbzip2 --force -n 5
/usr/bin/mysqldump --user=root --password=<DATABASE-PASSWORD> --single-transaction -B slurm_acct_db | bzip2 > /var/log/mariadb/slurm_acct_db_backup.bz2
endscript
}

You must first create an empty backup file:

touch /var/log/mariadb/slurm_acct_db_backup.bz2

The script is also available in https://github.com/OleHolmNielsen/Slurm_tools/tree/master/database.

Using mysqldump --single-transaction is recommended by SchedMD to avoid race conditions when slurmdbd is being run while taking the MySQL dump, see https://bugs.schedmd.com/show_bug.cgi?id=10295#c18

Testing the script:

logrotate -dv /etc/logrotate.d/slurm_acct_db_backup

Note: SELinux enforces that logrotate only create files in the /var/log/ folder and below. If logrotate tries to create files in other locations it will get permission denied errors, and errors will be present in /var/log/audit/audit.log. See the logrotate_selinux manual page and this Red Hat solution: https://access.redhat.com/solutions/39006

Backup of MariaDB 10.1 and later

In MariaDB 10.1 and later, Mariabackup is the recommended backup method to use instead of Percona XtraBackup, see this page.

The mysqldump_for_MariaDB utility still exists for MariaDB.

Restore of a database backup

The database contents must be loaded from the backup. To restore a MySQL database see for example How do I restore a MySQL .dump file?.

Make sure slurmdbd is stopped:

systemctl stop slurmdbd

As user root read in the above created backup file:

mysql -u root -p < /root/mysql_dump

or if the dump file has been compressed:

zcat  /root/mysql_dump.gz  | mysql -u root -p
bzcat /root/mysql_dump.bz2 | mysql -u root -p     # For bzip2 compressed files

The MariaDB/MySQL password will be asked for. Reading in the database dump may take many minutes depending on the size of the dump file, the storage system speed, and the CPU performance.

Upgrade of MySQL/MariaDB

If you restore a database dump onto a different server running a newer MySQL or MariaDB version there are some extra steps:

You should run the mysql_upgrade command whenever major (or even minor) version upgrades are made:

mysql_upgrade -p

It may be necessary to force an upgrade if you have restored a database dump made on an earlier version of MariaDB, say, when migrating from CentOS7/RHEL7 to CentOS8/RHEL8:

mysql_upgrade -p --force

It may be necessary to restart the mysqld service or reboot the server after this upgrade (??).

When migrating a database from CentOS/RHEL 7 (EL7) to RHEL 8 (and EL8 clones) you should read Upgrading from MariaDB 5.5 to MariaDB 10.0 since there are some incompatible changes between 5.5 and 10.

Slurm database modifications required for MariaDB 10.2.1 and above

In MariaDB 10.2.1 and above there are some important changes which have been discussed in bug_13562. Several Slurm database tables must be altered while the slurmdbd is stopped. Please note that EL7 contains MariaDB version 5.5, and EL8 contains MariaDB version 10.3, so this point is important, for example, when upgrading from EL7 to EL8! This has been resolved from Slurm 22.05.7.

We have discussed the procedure for MariaDB 10.2.1 and above in details in bug_15168. A future version of Slurm may perform these changes automatically.

The procedure for Slurm database modifications must be followed when MariaDB has been upgraded from older versions than 10.2.1 to 10.2.1 or newer, or when a Slurm database has been restored from a dump from an older MariaDB version.

The following steps should be made:

  1. Make sure slurmdbd is stopped:

    systemctl stop slurmdbd
    
  2. Configure MariaDB 10.3 for Slurm in the usual way (see above in the present page).

  3. Login to the slurm_acct_db database:

    # mysql -p -u slurm slurm_acct_db
    Enter password:
    
  4. Drop (delete) the following database table and show tables in the database:

    drop table table_defs_table;
    show tables;
    quit;
    
  5. Set a high debug level in slurmdbd.conf:

    DebugLevel=debug4
    
  6. Start the slurmdbd service and look for a number of table creation lines in /var/log/slurm/slurmdbd.log such as:

    debug4: Table "niflheim_assoc_table" doesn't exist, adding
    

    You can also repeat the show tables; command from above to verify that table_defs_table exists once again.

  7. At the end, set a normal debug level in slurmdbd.conf:

    DebugLevel=verbose
    

    and restart slurmdbd:

    systemctl start slurmdbd
    

Configure database accounting in slurm.conf

Finally, when you have made sure that the slurmdbd service is working correctly, you must configure slurm.conf to use slurmdbd.

In slurm.conf you must configure accounting so that the database will be used through the slurmdbd database daemon:

AccountingStorageType=accounting_storage/slurmdbd

Migrate the slurmdbd service to another server

It is recommended to run the slurmdbd database server on a separate host from the slurmctld’s server, see documents in Slurm_publications:

  • Technical: Field Notes From the Frontlines of Slurm Support, Tim Wickberg, SchedMD (2017) slides on High-Availability.

  • Technical: Field Notes Mark 2: Random Musings From Under A New Hat, Tim Wickberg, SchedMD (2018) slides on My Preferred Deployment Pattern:

However, many sites run both services successfully on the same server. If you decide to migrate the slurmdbd service to another server, here is a tested procedure which works on a running production cluster.

It is important to understand that the slurmctld service can run without problems even when the slurmdbd database is not responding, since slurmctld just caches all state information in the StateSaveLocation directory:

$ scontrol show config | grep StateSaveLocation
StateSaveLocation       = /var/spool/slurmctld

Therefore we can take down the slurmdbd service and server for a number of minutes or hours without any problems. The outstanding messages in the StateSaveLocation are currently capped at 3xNodes + MaxJobCount.

Configure a slurmdbd server

Install a new Slurm server as described in Slurm installation and upgrading. You must make sure that these prerequisites are satisfied:

  • The Munge service is running correctly.

  • The user passwd database contains all Slurm users.

Install the same Slurm version on the new server as on the old server! This ensures that the database migration will be as fast as possible. Any upgrading should be done at a later date according to the instructions in Upgrading Slurm.

Make sure to open the firewall completely as described in Firewall between slurmctld and slurmdbd.

Configure the MariaDB/MySQL and the slurmdbd services as described above.

Testing the database restore

Take a database dump file and restore it into the MariaDB/MySQL database (see above Backup and restore of database). Use the time_command to get an estimate of the time this will take.

Configure the server’s hostname ( for example db2) in slurmdbd.conf:

DbdHost=<hostname>

Start the slurmdbd service manually to see if any errors occur:

slurmdbd -D -vvvv

and wait for the output:

slurmdbd: debug2: Everything rolled up

and do a Control-C.

Database migration procedure

Let us denote the slurmdbd servers as:

  • db1 is the current slurmdbd and MariaDB database server. This could be the same as the slurmctld server, or it could be a dedicated server.

  • db2 is the designated new slurmdbd and MariaDB database server.

db1: stop slurmdbd

On the db1 server:

  1. Stop and disable slurmdbd and make sure the status is down:

    systemctl disable slurmdbd
    systemctl stop slurmdbd
    systemctl status slurmdbd
    
  2. Run the MySQL database dump described above Backup and restore of database.

    Copy the database dump to the db2 server. Make a long-term copy of the database dump.

  3. Stop any crontab jobs that run MySQL database dumps.

db2: restore database and start slurmdbd

On the db2 server:

  1. Make sure the slurmdbd service is stopped and that no crontab jobs will run database dumps.

  2. Load the database dump from db1 into MariaDB as shown above Backup and restore of database.

    If the MariaDB version on db2 than on db1 then you must remember to perform the MariaDB update actions shown above.

  3. Start the slurmdbd service manually to see if any errors occur:

    slurmdbd -D -vvvv
    

    and wait for the output:

    slurmdbd: debug2: Everything rolled up
    

    and do a Control-C.

  4. Start and enable slurmdbd and make sure the status is up:

    systemctl enable slurmdbd
    systemctl start slurmdbd
    systemctl status slurmdbd
    

Now the new slurmdbd service should be up and running on the db2 server in a stable state.

slurmctld server: reconfigure AccountingStorageHost

On the slurmctld server:

Now it’s time to reconfigure slurmctld for the new db2 slurmdbd server.

  1. Stop the slurmctld:

    systemctl stop slurmctld
    
  2. Edit slurm.conf to configure the new slurmdbd server (db2):

    AccountingStorageHost=db2
    
  3. Just for safety, but not required: Make a backup copy of the StateSaveLocation directory /var/spool/slurmctld:

    tar czf $HOME/var.spool.slurmctld.tar.gz /var/spool/slurmctld/*
    

    Make sure the contents of the tar-ball file looks correct!

  4. Start the slurmctld:

    systemctl start slurmctld
    

    and make a reconfig to notify all the slurmd processes:

    scontrol reconfig
    
  5. Check the slurmctld log file, for example:

    grep slurmdbd: /var/log/slurm/slurmctld.log
    
  6. Test that your Slurm cluster’s functionality has now been completely restored (use squeue, sinfo etc.).

db2: Enable database backups

On the db2 server:

  1. Make a crontab job for doing database dumps as in Backup and restore of database.

  2. Make sure the db2 server and the database dumps are backed up daily/regularly to your site’s backup service.