Slurm database

This page describes Slurm database configuration on CentOS/RHEL 7 servers. Jump to our Slurm top-level page.

Database documentation

See the accounting page and the Slurm_tutorials with Slurm Database Usage.

The following configuration is relevant only for the Database node (which may be the Head/Master node), but not the compute nodes.

Hardware optimization

SchedMD recommends to have a separate database server, if possible. It may be on the same server as slurmctld, but this may impact performance.

You should consider optimizing the database performance by mounting the MariaDB (MySQL) database directory on a dedicated high-speed file system:

/var/lib/mysql

Whether this is required depends on the number and frequency of jobs expected. A high-speed file system could be placed on a separate SSD SAS/SATA disk drive, or even better on a PCIe SSD disk drive.

Such disks must be qualified for high-volume random small read/write operations relevant for databases, and should be built with the Non-Volatile Memory Express (NVMe) storage interface standard for reliability and performance. It seems that NVMe support was added to Linux kernel 3.3 (and later), so it should work with RHEL7/CentOS7. A new scalable block layer for high-performance SSD storage was added in kernel 3.13.

A disk size of 200 GB or 400 GB should be sufficient. Consider installing 2 disk drives and run them in a RAID-1 mirrored configuration. Example hardware could be the Intel SSD P3700 series or the Kingston E1000 series.

Set up MariaDB database

The accounting page has a section named MySQL Configuration which should be studied first. Note that CentOS7/RHEL7 has replaced MySQL by the MariaDB database.

Make sure the MariaDB packages were installed before you built the Slurm RPMs:

rpm -q mariadb-server mariadb-devel
rpm -ql slurm-slurmdbd | grep accounting_storage_mysql.so     # Must show location of this file

Start the MariaDB service:

systemctl start mariadb
systemctl enable mariadb
systemctl status mariadb

Make sure to configure the MariaDB database's root password as instructed at first invocation of the mariadb service, or run this command:

/usr/bin/mysql_secure_installation

Select a suitable slurm user's database password. Now follow the accounting page instructions (using -p to enter the database password):

# mysql -p
mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'some_pass' with grant option;
mysql> SHOW VARIABLES LIKE 'have_innodb';
mysql> create database slurm_acct_db;
mysql> quit;

WARNING: Use the slurm database user's password in stead of some_pass.

You can verify the database grants for the slurm user:

# mysql -p -u slurm
mysql> show grants;
mysql> quit;

Regarding InnoDB, by default, MariaDB uses the XtraDB storage engine, a performance enhanced fork of the InnoDB storage engine.

This will grant user 'slurm' access to do what it needs to do on the local host or the storage host system. This must be done before the slurmdbd will work properly. After you grant permission to the user 'slurm' in mysql then you can start slurmdbd and the other Slurm daemons. You start slurmdbd by typing its pathname '/usr/sbin/slurmdbd' or '/etc/init.d/slurmdbd start'. You can verify that slurmdbd is running by typing ps aux | grep slurmdbd.

If the slurmdbd is not running you can use the -v option when you start slurmdbd to get more detailed information. Starting the slurmdbd in daemon mode with the -D -vvv option can also help in debugging so you don't have to go to the log to find the problem.

MySQL configuration

In the accounting page section Slurm Accounting Configuration Before Build some advice about MySQL configuration is given:

  • NOTE: Before running the slurmdbd for the first time, review the current setting for MySQL's innodb_buffer_pool_size. Consider setting this value large enough to handle the size of the database. This helps when converting large tables over to the new database schema and when purging old records. Setting innodb_lock_wait_timeout and innodb_log_file_size to larger values than the default is also recommended.

The following is recommended for /etc/my.cnf, but on CentOS 7 you should create a new file /etc/my.cnf.d/innodb.cnf containing:

[mysqld]
innodb_buffer_pool_size=1024M
innodb_log_file_size=64M
innodb_lock_wait_timeout=900

The innodb_buffer_pool_size might be even larger, like 50%-80% of the server's RAM size.

To implement this change you have to shut down the database and move/remove logfiles:

systemctl stop mariadb
mv /var/lib/mysql/ib_logfile? /tmp/
systemctl start mariadb

You can check the current setting in MySQL like so:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

See also Bug_2457:

  • The innodb_buffer_pool_size can have a huge impact - we'd recommend setting this as high as half the RAM available on the slurmdbd server.

SlurmDBD Configuration

While the slurmdbd will work with a flat text file for recording job completions and such this configuration will not allow "associations" between a user and account. A database allows such a configuration.

MySQL or MariaDB is the preferred database. To enable this database support one only needs to have the development package for the database they wish to use on the system. Slurm uses the InnoDB storage engine in MySQL to make rollback possible. This must be available on your MySQL installation or rollback will not work.

slurmdbd requires its own configuration file called slurmdbd.conf. Start by copying the example file from the slurmdbd.conf man-page.

The file slurmdbd.conf should be only on the computer where slurmdbd executes and should only be readable by the user which executes slurmdbd (e.g. "slurm"). It must be protected from unauthorized access since it contains a database login name and password:: See the slurmdbd.conf man-page for a more complete description of the configuration parameters.

Set up files and permissions:

chown slurm: /etc/slurm/slurmdbd.conf
chmod 600 /etc/slurm/slurmdbd.conf
touch /var/log/slurm/slurmdbd.log
chown slurm: /var/log/slurm/slurmdbd.log

Configure some of the /etc/slurm/slurmdbd.conf variables:

LogFile=/var/log/slurm/slurmdbd.log
DbdHost=XXXX    # Replace by the slurmdbd server hostname (for example, slurmdbd.my.domain)
DbdPort=6819    # The default value
SlurmUser=slurm
StorageHost=localhost
StoragePass=some_pass    # The above defined database password
StorageLoc=slurm_acct_db

Setting database purge parameters

A database with very many job records (maybe of the order of a million) is causing widespread problems when upgrading Slurm and the database. See the mailing list thread [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3.

In order to solve this problem, it seems necessary to purge job records from the Slurm database. In slurmdbd.conf you may define a number of purge parameters such as:

PurgeEventAfter
PurgeJobAfter
PurgeResvAfter
PurgeStepAfter
PurgeUsageAfter

The values of these parameters depend on the number of jobs in the database, which differs a lot between sites. There does not seem to be any heuristics for determining good values, so some testing will be required.

From the high_throughput page: You might also consider setting the Purge options in your slurmdbd.conf to clear out old Data. A typical configuration might look like this:

PurgeEventAfter=12months
PurgeJobAfter=12months
PurgeResvAfter=2months
PurgeStepAfter=2months
PurgeSuspendAfter=1month
PurgeTXNAfter=12months
PurgeUsageAfter=12months

The purge operation is done at the start of each time interval (see bug_4295), which means on the 1st day of the month in this example. Monthly, daily or even hourly purge operations would occur when using different time units for the same interval:

PurgeStepAfter=2months
PurgeStepAfter=60days
PurgeStepAfter=1440hours

Logging of purge events can be configured in slurmdbd.conf using:

DebugLevel=verbose
DebugFlags=DB_ARCHIVE

slurmdbd hostname configuration

The slurmdbd hostname must be configured correctly. The default value may be localhost, meaning that no other hosts can inquire the slurmdbd service (you may or may not want this limitation).

We recommend to explicitly set the slurmdbd hostname (for example, slurmdbd.my.domain) in these files:

  • DbdHost in slurmdbd.conf as documented above:

    DbdHost=slurmdbd.my.domain
  • AccountingStorageHost in slurm.conf:

    AccountingStorageHost=slurmdbd.my.domain

After restarting the slurmctld and slurmdbd services, verify the setup by:

scontrol show config | grep AccountingStorageHost

If other nodes than the slurmdbd node must be able to connect to the slurmdbd service, you must open the firewall to specific hosts. Please see the Slurm_configuration page under the firewall section.

Setting MaxQueryTimeRange

It may be a good idea to limit normal users from inquiring the database for too long periods of time. The slurmdbd.conf parameter is used for this, for example for a maximum of 60 days:

MaxQueryTimeRange=60-0

Start the slurmdbd service

First try to run slurmdbd manually to see the log:

slurmdbd -D -vvv

Terminate the process by Control-C when the testing is OK.

Start the slurmdbd service:

systemctl enable slurmdbd
systemctl start slurmdbd
systemctl status slurmdbd

If you get this error in /var/log/slurm/slurmdbd.log:

error: Couldn't find the specified plugin name for accounting_storage/mysql looking at all files

then the file /usr/lib64/slurm/accounting_storage_mysql.so is missing because you forgot to install the mariadb-devel RPM before building Slurm RPMs. You must install the mariadb-devel RPM and rebuild and reinstall Slurm RPMs as shown above.

Backup and restore of database

In order to backup the entire database to a different location (for disaster recovery or migration), the following files must be backed up:

  1. Make a database mysqldump using this script /root/mysqlbackup (insert the correct root database password for PWD):

    #!/bin/sh
    # MySQL Backup Script for All Databases
    HOST=localhost
    BACKUPFILE=/root/mysql_dump
    USER=root
    PWD='**********'
    DUMP_ARGS="--opt --flush-logs --quote-names"
    DATABASES="--all-databases"
    /usr/bin/mysqldump --host=$HOST --user=$USER --password=$PWD $DUMP_ARGS --result-file=$BACKUPFILE $DATABASES

    Write permission to $BACKUPFILE is required.

Make regular database dumps, for example by a crontab job:

# MySQL database backup
30 7 * * * /root/mysqlbackup

Restore of a database backup: The database contents must be loaded from the backup. To restore a MySQL database see for example How do I restore a MySQL .dump file?. As user root input the above created backup file:

mysql -u root -p < /root/mysql_dump

The MariaDB/MySQL password will be asked for.

Upgrade of MySQL/MariaDB

If you restore a database dump onto a different server running a newer MySQL/MariaDB version, for example upgrading MySQL 5.1 on CentOS 6 to MariaDB 5.5 on CentOS 7, there are some extra steps.

See Upgrading from MySQL to MariaDB about running the mysql_upgrade command:

mysql_upgrade

whenever major (or even minor) version upgrades are made, or when migrating from MySQL to MariaDB.

It may be necessary to restart the mysqld service or reboot the server after this upgrade (??).

Configure database accounting in slurm.conf

Finally, when you have made sure that the slurmdbd service is working correctly, you must configure slurm.conf to use slurmdbd.

In slurm.conf (see slurm.conf) you must configure accounting so that the database will be used through the slurmdbd database daemon:

AccountingStorageType=accounting_storage/slurmdbd

Migrate the slurmdbd service to another server

It is recommended to run the slurmdbd database server on a separate host from the slurmctld's server, see documents in Slurm_publications:

  • Technical: Field Notes From the Frontlines of Slurm Support, Tim Wickberg, SchedMD (2017) slides on High-Availability.
  • Technical: Field Notes Mark 2: Random Musings From Under A New Hat, Tim Wickberg, SchedMD (2018) slides on My Preferred Deployment Pattern:

However, many sites run both services successfully on the same server. If you decide to migrate the slurmdbd service to another server, here is a tested procedure which works on a running production cluster.

It is important to understand that the slurmctld service can run without problems even when the slurmdbd database is not responding, since slurmctld just caches all state information in the StateSaveLocation directory:

$ scontrol show config | grep StateSaveLocation
StateSaveLocation       = /var/spool/slurmctld

Therefore we can take down slurmdbd for a number of minutes or hours without problems. The outstanding messages in the StateSaveLocation are currently capped at 3xNodes + MaxJobCount.

Configure a slurmdbd server

Install a new Slurm server as described in Slurm_installation.

Install the same Slurm version on the new server as on the old server! This ensures that the database migration will be as fast as possible. Any upgrading should be done at a later date according to the instructions in Slurm_installation#upgrading-slurm.

Make sure to open the firewall completely as described in Slurm_configuration#firewall-between-slurmctld-and-slurmdbd.

Configure the MariaDB/MySQL and the slurmdbd services as described above.

Testing the database restore

Take a database dump file and restore it into the MariaDB/MySQL database (see above Slurm_database#backup-and-restore-of-database). Use the time command to get an estimate of the time this will take.

Configure the server's hostname ( for example db2) in slurmdbd.conf:

DbdHost=<hostname>

Start the slurmdbd service manually to see if any errors occur:

slurmdbd -D -vvvv

and wait for the output:

slurmdbd: debug2: Everything rolled up

and do a Control-C.

Database migration procedure

Let us denote the slurmdbd servers as:

  • db1 is the current slurmdbd and MariaDB database server. This could be the same as the slurmctld server, or it could be a dedicated server.
  • db2 is the designated new slurmdbd and MariaDB database server.

On the slurmctld server, increase the timeout values in slurm.conf sufficiently, for example:

SlurmctldTimeout=3600
SlurmdTimeout=3600

and copy slurm.conf to all nodes. Then reconfigure the running daemons and test the timeout values:

scontrol reconfigure
scontrol show config | grep Timeout
db1: stop slurmdbd

On the db1 server:

  1. Stop and disable slurmdbd and make sure the status is down:

    systemctl disable slurmdbd
    systemctl stop slurmdbd
    systemctl status slurmdbd
  2. Run the MySQL database dump described above Slurm_database#backup-and-restore-of-database.

    Copy the database dump to the db2 server. Make a long-term copy of the database dump.

  3. Stop any crontab jobs that run MySQL database dumps.
db2: restore database and start slurmdbd

On the db2 server:

  1. Make sure the slurmdbd service is stopped and that no crontab jobs will run database dumps.
  2. Load the database dump from db1 into MariaDB as shown above Slurm_database#backup-and-restore-of-database.
  3. Start the slurmdbd service manually to see if any errors occur:

    slurmdbd -D -vvvv

    and wait for the output:

    slurmdbd: debug2: Everything rolled up

    and do a Control-C.

  4. Start and enable slurmdbd and make sure the status is up:

    systemctl enable slurmdbd
    systemctl start slurmdbd
    systemctl status slurmdbd

Now the new slurmdbd service should be up and running on the db2 server in a stable state.

slurmctld server: reconfigure AccountingStorageHost

On the slurmctld server:

Now it's time to reconfigure slurmctld for the new db2 slurmdbd server.

  1. Stop the slurmctld:

    systemctl stop slurmctld
  2. Edit slurm.conf to configure the new slurmdbd server (db2):

    AccountingStorageHost=db2
  3. Make a backup copy of the StateSaveLocation /var/spool/slurmctld directory:

    tar czf $HOME/var.spool.slurmctld.tar.gz /var/spool/slurmctld
  4. Start the slurmctld:

    systemctl start slurmctld
  5. Check the slurmctld log file, for example:

    grep slurmdbd: /var/log/slurm/slurmctld.log
  6. Test that your Slurm cluster's functionality has now been completely restored (use squeue, sinfo etc.).
Update all nodes

On the slurmctld server:

Change the timeout values in slurm.conf back to the original values, for example:

SlurmctldTimeout=300
SlurmdTimeout=300

and copy slurm.conf to all nodes. Then reconfigure the running daemons and test the timeout values:

scontrol reconfigure
scontrol show config | grep Timeout

Restart the daemons:

systemctl restart slurmctld

and on all nodes (using SLURM#clustershell):

clush -ba systemctl restart slurmd
db2: Enable database backups

On the db2 server:

  1. Make a crontab job for doing database dumps as in Slurm_database#backup-and-restore-of-database.
  2. Make sure the db2 server and the database dumps are backed up daily/regularly to your site's backup service.

Niflheim: Slurm_database (last edited 2019-03-05 08:41:07 by OleHolmNielsen)