juropa.fz-juelich.de (Intel Xeon, Infiniband, MKL)

Here you find information about the the system http://www.fz-juelich.de/jsc/juropa.

Numpy is installed system wide, so separate installation is not needed.

Building GPAW with gcc

Build GPAW using gcc with the configuration file customize_juropa_gcc.py.

scalapack = True

library_dirs += ['/opt/intel/Compiler/11.0/074/mkl/lib/em64t']
libraries = ['mkl_intel_lp64' ,'mkl_sequential' ,'mkl_core',
             'mkl_lapack',
             'mkl_scalapack_lp64', 'mkl_blacs_intelmpi_lp64',
             'pthread'
             ]

define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')]
define_macros += [("GPAW_ASYNC",1)]
define_macros += [("GPAW_MPI2",1)]

and by executing:

module unload parastation/intel
module load parastation/gcc

python setup.py install --prefix='' --home=MY_INSTALLATION_DIR

Building GPAW with Intel compiler

Use the compiler wrapper file icc.py

#!/usr/bin/python
"""icc.py is a wrapper for the Intel compiler,
   converting/removing incompatible gcc args.   """

import sys
from subprocess import call

args2change = {"-fno-strict-aliasing":"",
               "-fmessage-length=0":"",
               "-Wall":"",
               "-std=c99":"-qlanglvl=extc99",
               "-fPIC":"",
               "-g":"",
               "-D_FORTIFY_SOURCE=2":"",
               "-DNDEBUG":"",
               "-UNDEBUG":"",
               "-pthread":"",
               "-shared":"-qmkshrobj",
               "-Xlinker":"",
               "-export-dynamic":"",
               "-Wstrict-prototypes":"",
               "-dynamic":"",
               "-O3":"",
               "-O3":"",
               "-O2":"",
               "-O1":""}

fragile_files = ["test.c"]

cmd = ""
fragile = False
for arg in sys.argv[1:]:
    cmd += " "
    t = arg.strip()
    if t in fragile_files:
        fragile = True
    if t in args2change:
        cmd += args2change[t]
    else:
        cmd += arg

flags = "-w -O3 -std=c99"
cmd = "mpicc %s %s"%(flags, cmd)

call(cmd, shell=True)

Internal libxc

Before revision 10429 libxc was internal, the corresponding configuration file is customize_juropa_icc.py.

compiler = './icc.py'
mpicompiler = './icc.py'
mpilinker = 'MPICH_CC=gcc mpicc'

scalapack = True

library_dirs += ['/opt/intel/Compiler/11.0/074/mkl/lib/em64t']
libraries = ['mkl_intel_lp64' ,'mkl_sequential' ,'mkl_core',
             'mkl_lapack',
             'mkl_scalapack_lp64', 'mkl_blacs_intelmpi_lp64',
             'pthread'
             ]

define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')]
define_macros += [("GPAW_ASYNC",1)]
define_macros += [("GPAW_MPI2",1)]

External libxc

After svn revision 10429 libxc has to be included as external library (see also the libxc web site). To install libxc we assume that MYLIBXCDIR is set to the directory where you want to install:

$ wget http://www.tddft.org/programs/octopus/down.php?file=libxc/libxc-2.0.2.tar.gz
$ tar -xzvf libxc-2.0.2.tar.gz
$ cd libxc-2.0.2/
$ mkdir install
$ ./configure CFLAGS="-fPIC" --prefix=$PWD/install -enable-shared
$ make |tee make.log
$ make install

This will have installed the libs $MYLIBXCDIR/libxc-2.0.2/install/lib and the C header files to $MYLIBXCDIR/libxc-2.0.2/install/include.

We have to modify the file customize.py to customize_juropa_icc_libxc.py

compiler = './icc.py'
mpicompiler = './icc.py'
mpilinker = 'MPICH_CC=gcc mpicc'

scalapack = True

library_dirs += ['/opt/intel/Compiler/11.0/074/mkl/lib/em64t']
libraries = ['mkl_intel_lp64' ,'mkl_sequential' ,'mkl_core',
             'mkl_lapack',
             'mkl_scalapack_lp64', 'mkl_blacs_intelmpi_lp64',
             'pthread'
             ]

libraries += ['xc']
# change this to your installation directory
LIBXCDIR='/lustre/jhome5/hfr04/hfr047/gridpaw/libxc-2.0.2/install/'
library_dirs += [LIBXCDIR + 'lib']
include_dirs += [LIBXCDIR + 'include']

define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')]
define_macros += [("GPAW_ASYNC",1)]
define_macros += [("GPAW_MPI2",1)]

Note that the location of the external libxc on runtime has to be enabled by setting:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MYLIBXCDIR/libxc-2.0.2/install/lib

Compiling

Now, default parastation/intel module is used so execute only:

python setup.py install --prefix='' --home=MY_INSTALLATION_DIR

Execution

General execution instructions can be found at http://www.fz-juelich.de/jsc/juropa/usage/quick-intro.

Example batch job script for GPAW (512 cores, 30 minutes):

#!/bin/bash -x
#MSUB -l nodes=64:ppn=8
#MSUB -l walltime=0:30:00

cd $PBS_O_WORKDIR
export PYTHONPATH="MY_INSTALLATION_DIR/ase/lib64/python"
export PYTHONPATH="$PYTHONPATH":"MY_INSTALLATION_DIR/gpaw/svn/lib64/python"
export GPAW_SETUP_PATH=SETUP_DIR/gpaw-setups-0.5.3574
export GPAW_PYTHON=MY_INSTALLATION_DIR/bin/gpaw-python

export PSP_ONDEMAND=1

mpiexec -np 512 -x $GPAW_PYTHON my_input.py --sl_default=4,4,64

Note that -x flag for mpiexec is needed for exporting the environment variables to MPI tasks. The environment variable PSP_ONDEMAND can decrease the running time with almost a factor of two with large process counts!

Job scripts can be written also using:

gpaw-runscript -h

Simultaneous Multi-Threading

SMT can be used to virtually double the number of nodes. A test case did not show an improvement in performance though.

#cores t[s] SMT date
64 2484 no 9.5.2011
64 2438 no 16.5.2011
128 1081 no 16.5.2011
64 4812 yes 16.5.2011
128 2077 yes 16.5.2011

SMT can be switched on in gpaw-runscript via:

gpaw-runscript -s