Slurm job scheduler
Jump to our Slurm top-level page.
The SchedulerType configuration parameter controls how queued jobs are executed. See the Scheduling_Configuration_Guide.
We use the backfill scheduler in slurm.conf:
but there are some backfill parameters that should be considered (see slurm.conf), for example:
The importance of bf_window is explained as:
- The default value is 1440 minutes (one day). A value at least as long as the highest allowed time limit is generally advisable to prevent job starvation. In order to limit the amount of data managed by the backfill scheduler, if the value of bf_window is increased, then it is generally advisable to also increase bf_resolution.
So you must configure bf_window according to the longest possible MaxTime in all partitions in slurm.conf:
PartitionName= ... MaxTime=XXX
An example slurm.conf fairshare configuration may be:
PriorityType=priority/multifactor PriorityDecayHalfLife=7-0 PriorityFavorSmall=NO PriorityMaxAge=10-0 PriorityWeightAge=10000 PriorityWeightFairshare=100000 PriorityWeightJobSize=10000 PriorityWeightPartition=10000 PriorityWeightQOS=10000 PropagateResourceLimitsExcept=MEMLOCK PriorityFlags=ACCRUE_ALWAYS,FAIR_TREE AccountingStorageEnforce=associations,limits,qos,safe
sacctmgr show qos format=name,priority
To enforce user jobs to have a QOS you must (at least) have:
- associations - This will prevent users from running jobs if their association is not in the database. This option will prevent users from accessing invalid accounts.
- limits - This will enforce limits set to associations. By setting this option, the 'associations' option is also set.
- qos - This will require all jobs to specify (either overtly or by default) a valid qos (Quality of Service). QOS values are defined for each association in the database. By setting this option, the 'associations' option is also set.
- safe - limits and associations will automatically be set.
A non-zero weight must be defined in slurm.conf, for example:
To enable any limit enforcement you must at least have:
in your slurm.conf, otherwise, even if you have limits set, they will not be enforced. Other options for AccountingStorageEnforce and the explanation for each are found on the Resource_Limits document.
It is desirable to prevent individual users from flooding the queue with jobs, in case there are idle nodes available, because it may block future jobs by other users. Note:
With Slurm it appears that the only way to achieve user job throttling is the following:
The GrpTRESRunMins limits can be applied to associations (accounts or users) as well as QOS. Set the limit by:
sacctmgr modify association where name=XXX set GrpTRESRunMin=cpu=1000000 # For an account/user asociation sacctmgr modify qos where name=some_QOS set GrpTRESRunMin=cpu=1000000 # For a QOS
If some partition XXX (for example big memory nodes) should have a higher priority, this is explained in Multifactor_Priority_Plugin by:
(PriorityWeightPartition) * (partition_factor) +
The Partition factor is controlled in slurm.conf, for example:
PartitionName=XXX ... PriorityJobFactor=10 PriorityWeightPartition=1000
View scheduling information for the Multifactor_Priority_Plugin by the commands:
sprio - view the factors that comprise a job's scheduling priority:
sprio # List job priorities sprio -l # List job priorities including username etc. sprio -w # List weight factors used by the multifactor scheduler
sshare - Tool for listing the shares of associations to a cluster:
sshare sshare -l # Long listing with additional information sshare -a # Listing with also user information
sdiag - Scheduling diagnostic tool for Slurm