- Accounting setup in Slurm
- Database Configuration
- Create accounts and users
- Enforce accounting
- Accounting information
- Accounting reports
Jump to our Slurm top-level page.
The following configuration is relevant for the Head/Master node only.
Before setting up accounting, you need to set up the Slurm_database.
See the section Database Configuration in the accounting page. The command to manage accounts is:
- sacctmgr - View and modify Slurm account information
Accounting records are maintained based upon what we refer to as an Association, which consists of four elements:
- user names,
- an optional partition name.
NOTE: There is an order to set up accounting associations. You must define clusters before you add accounts, and you must add accounts before you can add users.
Add the cluster name to slurm.conf:
AccountingStorageType=accounting_storage/slurmdbd AccountingStoreJobComment=YES ClusterName=niflheim
See the Database Configuration section of the accounting page:
sacctmgr add cluster niflheim sacctmgr create cluster niflheim # The add and create commands are identical
See the Database Configuration section of the accounting page.
Definition of accounts:
- Not formally defined in the Slurm documentation, but see the accounting page for examples.
- An account is similar to a UNIX group.
- An account may contain multiple users, or just a single user.
- Accounts may be organized as a hierarchical tree.
- A user may belong to multiple accounts, but must have a DefaultAccount (see sacctmgr).
Create a hierarchical organization list using sacctmgr, for example, with departments and external users:
sacctmgr add account dtu Description="DTU departments" Organization=dtu sacctmgr add account fysik Description="Physics department" Organization=fysik parent=dtu sacctmgr add account deptx Description="X department" Organization=deptx parent=dtu sacctmgr add account external Description="External groups" Organization=external sacctmgr add account companyx Description="Company X" Organization=companyx parent=external
You may also create subgroups within the departments:
sacctmgr add account camd Description="CAMD section" Organization=camd parent=fysik
If you wish to assign different resources within the departmental subgroups, you could use the UNIX GID group name to differentiate between faculty, staff and students, for example. Use the GID names from the /etc/group file to create new accounts within the same Organization name, for example, for the CAMD section students with GID group name camdstud:
sacctmgr add account camdstud Description="CAMD students" Organization=camd parent=camd
Display the accounts created:
sacctmgr show account sacctmgr show account -s # Show also associations in the accounts
When either adding or modifying an account, the following sacctmgr options are available:
- Cluster= Only add this account to these clusters. The account is added to all defined clusters by default.
- Description= Description of the account. (Default is account name)
- Name= Name of account. Note the name must be unique and can not represent different bank accounts at different points in the account hierarchy
- Organization= Organization of the account. (Default is parent account unless parent account is root then organization is set to the account name.)
- Parent= Make this account a child of this other account (already added).
Create a Slurm user named xxx with a specific default account (required) yyy:
sacctmgr create user name=xxx DefaultAccount=yyy
If desired users may also be added to additional accounts (see accounting), for example:
sacctmgr add user xxx Account=zzzz
The fairshare and other settings for the non-default account may be configured:
sacctmgr modify user where name=xxx account=zzzz set fairshare=0
A non-default account name may be specified in the user's batch jobs, for example with sbatch:
sbatch -A <account> or --account=<account>
List users by:
sacctmgr show user sacctmgr show user -s sacctmgr show account -s xxx
When either adding or modifying a user, the following sacctmgr options are available:
- Account= Account(s) to add user to (see also DefaultAccount).
- AdminLevel= This field is used to allow a user to add accounting privileges to this user. Valid options are:
- Operator: can add, modify, and remove any database object (user, account, etc), and add other operators.
On a SlurmDBD served slurmctld these users can:
- View information that is blocked to regular uses by a PrivateData flag (see slurm.conf).
- Create/Alter/Delete Reservations
- Admin: These users have the same level of privileges as an operator in the database. They can also alter anything on a served slurmctld as if they were the slurm user or root.
- Cluster= Only add to accounts on these clusters (default is all clusters)
- DefaultAccount= Default account for the user, used when no account is specified when a job is submitted. (Required on creation)
- DefaultWCKey= Default WCkey for the user, used when no WCkey is specified when a job is submitted. (Only used when tracking WCkey.)
- Name= User name
- Partition= Name of Slurm partition this association applies to.
For example, to permit user xxx to execute jobs on all clusters with a default account of fysik execute:
sacctmgr add user xxx DefaultAccount=fysik
You can modify the database items using SQL-like where and set, for example:
sacctmgr modify account where cluster=niflheim name=fysik set Description="DTU Physics"
The following has been copied from the accounting page:
When modifying entities, you can specify many different options in SQL-like fashion, using key words like where and set. A typical execute line has the following form:
sacctmgr modify <entity> set <options> where <options>
sacctmgr modify user set default=none where default=test
will change all users with a default account of "test" to account "none". Once an entity has been added, modified or removed, the change is sent to the appropriate Slurm daemons and will be available for use instantly.
Removing entities using an execute line similar to the modify example above, but without the set options. For example, remove all users with a default account "test" using the following execute line:
sacctmgr remove user where default=test
To remove a user from an account:
sacctmgr remove user brian where account=physics
Note: In most cases, removed entities are preserved, but flagged as deleted. If an entity has existed for less than 1 day, the entity will be removed completely. This is meant to clean up after typographic errors.
To enable any limit enforcement you must at least have:
in your slurm.conf, otherwise, even if you have limits set, they will not be enforced. Other options for AccountingStorageEnforce and the explanation for each are found on the Resource_Limits document.
Now you can impose user limits, for example:
sacctmgr modify user xxx set GrpTRES=cpu=1000 GrpTRESRunMin=cpu=2000000
A TRES is a resource that can be tracked for usage or used to enforce limits against.
sacctmgr add qos high priority=10 MaxTRESPerUser=cpu=256
The Quality of Service (QOS) Factor is defined in the Multifactor_Priority_Plugin page as:
Each QOS can be assigned an integer priority. The larger the number, the greater the job priority will be for jobs that request this QOS. This priority value is then normalized to the highest priority of all the QOS's to become the QOS factor.
View defined QOS'es with various degrees of detail by:
sacctmgr show qos sacctmgr show qos format=name sacctmgr --noheader show qos format=name
A user may be allowed to use a certain QOS like in these examples:
sacctmgr -i modify user where name=XXXX set QOS=normal,high sacctmgr -i modify user where name=XXXX set QOS+=high
The user DefaultQOS (see sacctmgr) may be set by:
sacctmgr -i modify user where name=XXXX set DefaultQOS=normal
sbatch --qos=high ...
The account and user associations created above only take effect after you enable:
When AccountingStorageEnforce is changed, a restart of the slurmctld daemon is required (not just a scontrol reconfig).
Inquire accounting information using these commands:
- sacct - Displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database.
- sstat - Display various status information of a running job/step.
- sreport - Generate reports from the slurm accounting data.
- scontrol show assoc_mgr - displays the current contents of the slurmctld 's internal cache for users, associations and/or qos.
Show user fairshare etc. information:
sacctmgr show associations format=account,user,fairshare,GrpTRES,GrpTRESRunMin
The /usr/bin/seff command takes a jobid and reports on the efficiency of that job's cpu and memory utilization (requires Slurm 15.08 or later). The slurm-contribs RPM (Slurm 17.02 and later, previously slurm-seff) also comes with an /usr/bin/smail utility that allows for Slurm end-of-job emails to include a seff report, see bug_1611. This allows users to become aware if they are wasting resources.
Note: You may like to copy the updated smail from https://github.com/OleHolmNielsen/Slurm_tools/tree/master/smail to add the cluster name to mail headers. This is included in Slurm 17.11, but make sure to get the bugfix in 17.11.6.
The smail utility is invoked automatically to process end-of-job notifications if you add the following to slurm.conf:
User job scripts may also use this line as the last line:
Use sreport to generate reports from the slurm accounting data, for example:
sreport cluster UserUtilizationByAccount sreport cluster AccountUtilizationByUser
The accounting timings will by default be displayed in units of TRES Minutes.
Selection of date ranges:
sreport ... Start=02/01 End=02/25 sreport ... Start=`date -d "last month" +%D` End=`date -d "this month" +%D`
Change the date/time format in report header for readability (formats in "man strftime"):
env SLURM_TIME_FORMAT="%d-%b-%Y_%R" sreport ...
Show accounting indented as a tree:
sreport cluster AccountUtilizationByUser tree
Show top user accounting:
sreport user top start=0101 end=0201 TopCount=50 -t hourper --tres=cpu,gpu
Specify the accounting time format (default is Minutes) from sreport:
sreport -t hourper ...
Report specified TRES accounting (default is cpu):
sreport --tres cpu,gpu ...
Print parseable output from sreport:
sreport -p ...
for further processing with scripts.
Cluster utilization report:
sreport -t hourper cluster Utilization