Niflheim

Niflheim7 Getting started

The Niflheim7 setup is based upon:

For support please see the Niflheim_support page.

Login to Niflheim7

Login to Niflheim7 is available with SSH only from the DTU network. If you are outside of DTU, please log in to the DTU VPN service first.

Please login to the node type identical to the compute-nodes onto which you submit batch jobs. The login nodes are:

  • sylg.fysik.dtu.dk and slid.fysik.dtu.dk: Login nodes for CPU type xeon24:
    • The newest Intel CPU type xeon24.
    • A 24-CPU (dual-processor, 12-core + Hyperthreading = 48 "virtual" cores), 256 GB of RAM.
    • CPUs: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz Broadwell
    • Refer to this as CPU_ARCH= broadwell.
  • thul.fysik.dtu.dk and surt.fysik.dtu.dk: Login nodes for CPU type xeon16:
    • A 16-CPU (dual-processor, 8-core + Hyperthreading = 32 "virtual" cores), 64 GB of RAM.
    • CPUs: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Sandy_Bridge.
    • Refer to this as CPU_ARCH= sandybridge. Use this login node also for the ivybridge architecture.
  • fjorm.fysik.dtu.dk: Login node for CPU type xeon8:
    • Compatible with the older Intel CPU type xeon8.
    • An 8-CPU (dual-processor, 4-core + Hyperthreading = 16 "virtual" cores), 48 GB of RAM.
    • CPUs: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz Nehalem.
    • Refer to this as CPU_ARCH= nehalem.

The login nodes sylg, thul and fjorm must not be overloaded with heavy tasks, since this will disturb other users.

The login nodes slid and surt would be OK for more heavy testing of codes, but please bear in mind that the login nodes may be shared by many users, and no single user should monopolize any login nodes. Long tasks should always be submitted as batch jobs.

For support please see the Niflheim_support page.

Usage of binary compiled code

Users of Niflheim should be aware of some important facts about different CPU types:

Newer CPUs use new machine instructions (especially AVX or AVX2 vector instructions) which do not exist on older CPUs, so:

  • Code compiled on newer CPUs may potentially crash when executed on older nodes.
  • Code compiled on older CPUs is likely to run much slower on newer nodes because available vector instructions are not used.
  • Do not run old binaries compiled on other and older systems (such as the old Niflheim). Such binaries will run slowly or even crash.

Slurm batch queueing system

Here is a brief introduction to the usage of Slurm:

Compute node partitions

Slurm node partitions are the compute resource in Slurm which group nodes into logical (possibly overlapping) sets.

Niflheim7 contains a number of node partitions with different types of hardware and the corresponding recommended login nodes:

Partition CPU cores RAM memory Login nodes
xeon8 8 24 GB fjorm
xeon8_48 8 48 GB fjorm
xeon16 16 64 GB thul, surt
xeon16_128 16 128 GB thul, surt
xeon16_256 16 256 GB thul, surt
xeon24 24 256 GB sylg
xeon24_512 24 512 GB sylg

Notice: Partitions are overlapping so that nodes with more memory are also members of the partition with the least memory.

The default partition is the xeon8 partition.

View available nodes and jobs

Use sinfo to view available nodes:

sinfo

and to view the queue use squeue:

squeue

List of pending jobs in the same order considered for scheduling by Slurm:

squeue --priority --sort=-p,i --states=PD

Hint: Set an environment variable so that the default output format contains more information:

export SQUEUE_FORMAT="%.18i %.9P %.8j %.8u %.10T %.9Q %.10M %.9l %.6D %.6C %R"

To change the time display format see man squeue, for example:

export SLURM_TIME_FORMAT="%a %T"

Submitting batch jobs to Niflheim7

The command sbatch is used to submit jobs to the batch queue. You can submit your Slurm script file to the default partition by:

sbatch scriptfile

See the above mentioned pages for information about writing Slurm script files, which may contain a number batch job parameters. See the sbatch page and this example:

#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<Your E-mail>  # The default value is the submitting user.
#SBATCH --partition=xeon8
#SBATCH -N 2      # Minimum of 2 nodes
#SBATCH -n 48     # 24 MPI processes per node, 48 tasks in total, appropriate for xeon24 nodes
#SBATCH --time=1-02:00:00
#SBATCH --mem=4G     # 4 GB RAM per node
#SBATCH --output=mpi_job_slurm.log

It is strongly recommended to specify both nodes and tasks numbers so that jobs will occupy entire nodes. For selecting the correct number of nodes and tasks (cores) see the sbatch man-page items: as well as the * -N, --nodes=<minnodes[-maxnodes]>

Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count...
  • -n, --ntasks=<number>
     

    sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.

To view the queue use squeue as shown above.

To delete a job use scancel:

scancel <jobid>
Using local node scratch disks

It is important that every user refrain from overloading the central file servers by writing job temporary/scratch files to the central servers.

Each compute node has a local scratch disk where each user has a private scratch directory:

/scratch/$USER/

where the user's job temporary files must be stored while the job is running. The files should be deleted just before the job is completed by the user's batch job script, for example, by a command like:

rm -rf /scratch/$USER/*

There is no backup of these scratch files.

Node partitions for jobs

To view information about Slurm nodes and partitions use this command:

sinfo

The limits for jobs in specific partitions are:

Partition Wall-clock limits
xeon8, xeon8_48 1 week (168h)
xeon16, xeon16_128, xeon16_256 1 week (168h)
xeon24, xeon24_512 50h

You can select a specific node partition with lines in the script (or on the command line):

  • Select the 8-core nodes in the xeon8 partition (default):

    #SBATCH --partition=xeon8
  • Select the 16-core nodes in the xeon16 partition:

    #SBATCH --partition=xeon16
  • Select the 24-core nodes in the xeon24 partition:

    #SBATCH --partition=xeon24
  • Select the 24-core nodes in the xeon24 partition which also have 512 GB RAM memory:

    #SBATCH --partition=xeon24_512

User limits on batch jobs

The following running job limits are in effect for all user accounts:

Parameter Limit
Number of CPU cores (GrpTRES) 1000
CPUs*time (GrpTRESRunMins) 2600000

Slurm FairShare parameters:

User type FairShare
VIP/PhD 3%
Student 2%
Faculty 5%
Guest/external 1%

To display job FairShare priority values use:

sprio -l -u $USER

Viewing completed job information

After your job has completed (or terminated), you can view job accounting data by inquiring the Slurm database. For example, to inquire about a specific job Id 1234:

sacct -j 1234 -o jobid,jobname,user,Timelimit,Elapsed,NNodes,Partition,ExitCode,nodelist

You may inquire about many job parameters, to see a complete list run:

sacct -e

Software environment modules

The classical problem of maintaining multiple versions of software packages and compilers is solved using Software_Modules.

Niflheim uses the Lmod implementation of software environment modules (we do not use the modules command in CentOS). For creating modules we support the EasyBuild_modules build and installation framework.

The Lmod command module (and its brief equivalent form ml) is installed on all nodes.

Read the Lmod_User_Guide to learn about usage of modules. For example, to list available modules:

module avail
ml av

You can load any available module like in this example:

module load GCC
ml GCC

Loading complete toolchains

The modules framework at Niflheim7 includes a number of convenient toolchains built as EasyBuild_modules. We currently provide these toolchains:

  • The foss toolchain provides GCC, OpenMPI, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW.

    Usage and list of contents:

    module load foss
    module list
  • The iomkl toolchain provides Intel compilers, Intel MKL, OpenMPI.

    Usage and list of contents:

    module load iomkl
    module list

In the future there may be several versions of each toolchain, list them like this:

module whatis foss
module whatis iomkl

Need additional modules?

Please send your requests for additional modules to the Niflheim_support E-mail. We will see if EasyBuild_modules are already available.

Building your own modules

It is possible for you to use your personal modules in addition to those provided by the Niflheim7 system. If you use EasyBuild_modules you can define your private module directory in your home directory and prepend it to the already defined modules:

mkdir $HOME/modules
export EASYBUILD_PREFIX=$HOME/modules
module use $EASYBUILD_PREFIX/modules/all
module load EasyBuild

and then build and install EasyBuild_modules into $HOME/modules. If you need help with this, please write to the Niflheim_support E-mail.

Please note that Niflheim7 is a heterogeneous cluster comprising several generations of CPUs, where the newer ones have CPU instructions which don't exist on older CPUs. Therefore code compiled on a new CPU may crash if executed on an older CPU. However, the Intel compilers should generate multiple versions of machine code which may automatically select the correct code at run-time.

If you compile code for the "native" CPU-architecture, it is proposed that you compile separate versions for each CPU architecture. For your convenience we offer a system environment variable which you may use to select the correct CPU architecture:

[ohni@fjorm ~]$ echo $CPU_ARCH
nehalem

The Nehalem architecture corresponds to the xeon8 compute nodes, and the GCC compiler (version 4.9 and above) will recognize this architecture name:

module load GCC
gcc -march=native -Q --help=target | grep march | awk '{print $2}'
nehalem

GPU compute nodes

As described on the Hardware page, Niflheim has 2 compute HPE SL270s nodes h[001-002], each of which has 4 Nvidia Tesla K20X GPUs (a total of 8 GPUs).

The thul login node must be used to build software for GPUs, since it has the same CPU architecture as the GPU-nodes, and since GPU-specific software modules will only be provided on compatible nodes.

CUDA software is only available as a module on the xeon16 (Sandy Bridge) login node thul and compute nodes:

# module avail CUDA
CUDA/8.0.44-GCC-5.4.0-2.26

Additional CUDA software modules can be installed by user request.

To submit a batch jobs to a single K20Xm GPU use a syntax like:

sbatch --gres=gpu:K20Xm:1 -n1 -N1-1 my_gpu_script

For further Slurm information see the GRES page.

Containers on Niflheim

Containers for virtual operating system and software environments have become immensely popular. The most well-known Containers system is Docker, and huge numbers of Containers have been created for this environment. Containers are well suited to running one or two applications non-interactively in their own custom environments. Containers share the under-lying Linux kernel of the host system, so only Linux Containers can exist on a Linux host.

However, Docker is not well suited for a shared multi-user system, let alone an HPC supercomputer system, primarily due to security issues and performance issues with parallel HPC applications. Please see the Singularity_security page.

A relatively new Containers technology created for HPC is Singularity, developed at Lawrence Berkeley Lab (LBL). Singularity assumes (more or less) that each application will have its own container. Singularity assumes that you will have a build system where you are the root user, but that you will also have a production system where you may not be the root user.

To learn more about Singularity, please see some resources:

Singularity on Niflheim

We have installed Singularity (current version: 2.4.2) as RPM packages on all login nodes and compute nodes (runtime only). To get started with Singularity it is recommended to follow the Singularity_tutorial page, where you may skip to Hour 2 (Building and Running Containers).

You can make a test run of a Docker container to be executed by Singularity:

singularity run docker://godlovedc/lolcow

Examples of Singularity containers are in this directory:

/usr/share/doc/singularity-2.*/examples/

If you want to build and test Singularity containers on Niflheim, we must grant you some sudo priviledge - please write to the Niflheim_support E-mail.

Alternatively, if you have root priviledge on your personal Linux PC, you may want to make a Singularity_installation locally. Make sure to install Singularity 2.4.x as on Niflheim, since several Linux di https://hub.docker.com/r/openfoam/ under Singularity på denne simple måde:

[ohni@surt ohni]$ singularity run docker://openfoam/openfoam4-paraview50 stributions offer very old versions! Finished containers can be copied to Niflheim, and executing Singularity containers is as a normal user without any root priviledge at all!

Please note that you must build containers within a local file system (not a shared file system like NFS where root access is prohibited), so please go to a local scratch directory such as /scratch/$USER).

Questions: Please write to the Niflheim_support E-mail.

Running Docker containers

Docker containers can be executed under Singularity. For example, make a test run of a simple Docker container from DockerHub:

singularity run docker://godlovedc/lolcow

You can run many recent versions of CentOS Docker containers from the CentOS library, for example a 6.9 container:

singularity run docker://centos:centos6.9

Ubuntu Linux may be run from the Ubuntu library:

singularity run docker://ubuntu:17.10

Application codes may also be on DockerHub, for example an OpenFOAM container can be run with:

singularity run docker://openfoam/openfoam4-paraview50

Singularity batch jobs

You can submit normal Slurm batch jobs to the queue running Singularity containers just like any other executable. Ubuntu An example job script running a container image lolcow.simg:

#!/bin/sh
#SBATCH --mail-type=ALLsingularity run docker://ubuntu:17.10
#SBATCH --partition=xeon16
#SBATCH --mem=4G
#SBATCH --time=05:00
#SBATCH --output=lolcow.%J.log
singularity exec lolcow.simg cowsay 'How did you get out of the container?'

To run a Singularity container in parallel on 2 nodes and 10 CPU cores with MPI use the following lines:

#SBATCH -N 2-2
#SBATCH -n 10
module load OpenMPI
mpirun -n $SLURM_NTASKS singularity exec lolcow.simg cowsay 'How did you get out of the container?'

Niflheim: Niflheim7_Getting_started (last edited 2018-05-29 13:49:22 by OleHolmNielsen)