- Database documentation
- Hardware optimization
- Set up MariaDB database
- SlurmDBD Configuration
- Start the slurmdbd service
- Backup and restore of database
- Configure database accounting in slurm.conf
The following configuration is relevant only for the Database node (which may be the Head/Master node), but not the compute nodes.
You should consider optimizing the database performance by mounting the MariaDB (MySQL) database directory on a dedicated high-speed file system:
Whether this is required depends on the number and frequency of jobs expected. A high-speed file system could be placed on a separate SSD SAS/SATA disk drive, or even better on a PCIe SSD disk drive.
Such disks must be qualified for high-volume random small read/write operations relevant for databases, and should be built with the Non-Volatile Memory Express (NVMe) storage interface standard for reliability and performance. It seems that NVMe support was added to Linux kernel 3.3 (and later), so it should work with RHEL7/CentOS7. A new scalable block layer for high-performance SSD storage was added in kernel 3.13.
A disk size of 200 GB or 400 GB should be sufficient. Consider installing 2 disk drives and run them in a RAID-1 mirrored configuration. Example hardware could be the Intel SSD P3700 series or the Kingston E1000 series.
rpm -q mariadb-server mariadb-devel rpm -ql slurm-sql | grep accounting_storage_mysql.so # Must show location of this file
Start the MariaDB service:
systemctl start mariadb systemctl enable mariadb systemctl status mariadb
Make sure to configure the MariaDB database's root password as instructed at first invocation of the mariadb service, or run this command:
Select a suitable slurm user's database password. Now follow the accounting page instructions (using -p to enter the database password):
# mysql -p mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'some_pass' with grant option; mysql> SHOW VARIABLES LIKE 'have_innodb'; mysql> create database slurm_acct_db; mysql> quit;
WARNING: Use the slurm database user's password in stead of some_pass.
This will grant user 'slurm' access to do what it needs to do on the local host or the storage host system. This must be done before the slurmdbd will work properly. After you grant permission to the user 'slurm' in mysql then you can start slurmdbd and the other Slurm daemons. You start slurmdbd by typing its pathname '/usr/sbin/slurmdbd' or '/etc/init.d/slurmdbd start'. You can verify that slurmdbd is running by typing ps aux | grep slurmdbd.
If the slurmdbd is not running you can use the -v option when you start slurmdbd to get more detailed information. Starting the slurmdbd in daemon mode with the -D -vvv option can also help in debugging so you don't have to go to the log to find the problem.
In the accounting page section Slurm Accounting Configuration Before Build some advice about MySQL configuration is given:
- NOTE: Before running the slurmdbd for the first time, review the current setting for MySQL's innodb_buffer_pool_size. Consider setting this value large enough to handle the size of the database. This helps when converting large tables over to the new database schema and when purging old records. Setting innodb_lock_wait_timeout and innodb_log_file_size to larger values than the default is also recommended.
The following is recommended for /etc/my.cnf, but on CentOS 7 you should create a new file /etc/my.cnf.d/innodb.cnf containing:
[mysqld] innodb_buffer_pool_size=1024M innodb_log_file_size=64M innodb_lock_wait_timeout=900
The innodb_buffer_pool_size might be even larger, like 50%-80% of the server's RAM size.
To implement this change you have to shut down the database and move/remove logfiles:
systemctl stop mariadb mv /var/lib/mysql/ib_logfile? /tmp/ systemctl start mariadb
You can check the current setting in MySQL like so:
mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
See also Bug_2457:
- The innodb_buffer_pool_size can have a huge impact - we'd recommend setting this as high as half the RAM available on the slurmdbd server.
While the slurmdbd will work with a flat text file for recording job completions and such this configuration will not allow "associations" between a user and account. A database allows such a configuration.
MySQL or MariaDB is the preferred database. To enable this database support one only needs to have the development package for the database they wish to use on the system. Slurm uses the InnoDB storage engine in MySQL to make rollback possible. This must be available on your MySQL installation or rollback will not work.
cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
The file slurmdbd.conf should be only on the computer where slurmdbd executes and should only be readable by the user which executes slurmdbd (e.g. "slurm"). It must be protected from unauthorized access since it contains a database login name and password. See the slurmdbd.conf man-page for a more complete description of the configuration parameters.
Set up files and permissions:
chown slurm: /etc/slurm/slurmdbd.conf chmod 600 /etc/slurm/slurmdbd.conf touch /var/log/slurm/slurmdbd.log chown slurm: /var/log/slurm/slurmdbd.log
Configure some of the /etc/slurm/slurmdbd.conf variables:
LogFile=/var/log/slurm/slurmdbd.log DbdHost=XXXX # Replace by the slurmdbd server hostname (for example, slurmdbd.my.domain) DbdPort=6819 # The default value SlurmUser=slurm StorageHost=localhost StoragePass=some_pass # The above defined database password StorageLoc=slurm_acct_db
From the high_throughput page: You might also consider setting the Purge options in your slurmdbd.conf to clear out old Data. A typical configuration would look like this:
PurgeEventAfter=12months PurgeJobAfter=12months PurgeResvAfter=2months PurgeStepAfter=2months PurgeSuspendAfter=1month PurgeTXNAfter=12months PurgeUsageAfter=12months
Notice: The changes to service files are included from Slurm 17.02 and onwards, see Bug 3192.
Copy the delivered service files:
cp /usr/lib/systemd/system/slurmctld.service /usr/lib/systemd/system/slurmd.service /usr/lib/systemd/system/slurmdbd.service /etc/systemd/system/
Add the prerequisite After= services to the file /etc/systemd/system/slurmdbd.service:
[Unit] Description=Slurm controller daemon After=network.target mariadb.service ConditionPathExists=/etc/slurm/slurm.conf ...
On compute nodes /etc/systemd/system/slurmd.service should be modified:
[Unit] Description=Slurm node daemon After=network.target munge.service ConditionPathExists=/etc/slurm/slurm.conf ...
[Unit] Description=Slurm controller daemon After=network.target slurmdbd.service munge.service ConditionPathExists=/etc/slurm/slurm.conf ...
See man systemd.unit and what is difference between /usr/lib and /etc/systemd?. Alternatively, a drop-in containing only the changes can be implemented as described in the EXAMPLES section of man systemd.unit.
We recommend to explicitly set the slurmdbd hostname (for example, slurmdbd.my.domain) in these files:
DbdHost in slurmdbd.conf as documented above:
AccountingStorageHost in slurm.conf:
scontrol show config | grep AccountingStorageHost
If other nodes than the slurmdbd node must be able to connect to the slurmdbd service, you must open the firewall to specific hosts. Please see the Slurm_configuration page under the firewall section.
A database with very many job records (maybe of the order of a million) is causing widespread problems when upgrading Slurm and the database. See the mailing list thread [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3.
PurgeEventAfter PurgeJobAfter PurgeResvAfter PurgeStepAfter PurgeUsageAfter
The values of these parameters depend on the number of jobs in the database, which differs a lot between sites. There does not seem to be any heuristics for determining good values, so some testing will be required.
Start the slurmdbd service:
systemctl enable slurmdbd systemctl start slurmdbd systemctl status slurmdbd
If you get this error in /var/log/slurm/slurmdbd.log:
error: Couldn't find the specified plugin name for accounting_storage/mysql looking at all files
then the file /usr/lib64/slurm/accounting_storage_mysql.so is missing because you forgot to install the mariadb-devel RPM before building Slurm RPMs. You must install the mariadb-devel RPM and rebuild and reinstall Slurm RPMs as shown above.
In order to backup the entire database to a different location (for disaster recovery or migration), the following files must be backed up:
Make a database mysqldump using this script /root/mysqlbackup (insert the correct root database password for PWD):
#!/bin/sh # MySQL Backup Script for All Databases HOST=localhost BACKUPFILE=/root/mysql_dump USER=root PWD='**********' DUMP_ARGS="--opt --flush-logs --quote-names" DATABASES="--all-databases" /usr/bin/mysqldump --host=$HOST --user=$USER --password=$PWD $DUMP_ARGS --result-file=$BACKUPFILE $DATABASES
Write permission to $BACKUPFILE is required.
Make regular database dumps, for example by a crontab job:
# MySQL database backup 30 7 * * * /root/mysqlbackup
Restore of a database backup: The database contents must be loaded from the backup. To restore a MySQL database see for example How do I restore a MySQL .dump file?. As user root input the above created backup file:
mysql -u root -p < /root/mysql_dump
The MySQL password will be asked for.
If you restore a database dump onto a different server running a newer MySQL/MariaDB version, for example upgrading MySQL 5.1 on CentOS 6 to MariaDB 5.5 on CentOS 7, there are some extra steps.
It may be necessary to restart the mysqld service or reboot the server after this upgrade (??).