Blue Gene/Q - Mira

The build instructions here are representative of the Blue Gene/Q system at the Argonne Leadership Computing Facility. Users will need to adapt these instructions for specific installations at their respective sites.

In addition to the standard libraries needed on other platforms, Scalable Python https://gitorious.org/scalable-python is required for running effectively on Blue Gene/Q. A build script for Scalable Python is provided below:

#!/bin/bash

export CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc
export CXX=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-g++
export MPICC=mpicc
export CCSHARED=-fPIC
export LINKFORSHARED='-Xlinker -export-dynamic -dynamic'
export MPI_LDFLAGS_SHARED='-Xlinker -export-dynamic -dynamic'

./configure --prefix=/soft/apps/python/scalable-python-2.6.7-cnk-gcc --enable-mpi --disable-ipv6   2>&1 | tee mira-conf

make 2>&1 | tee mira-make
make mpi 2>&1 | tee mira-make-mpi

make install 2>&1 | tee mira-inst
make install-mpi 2>&1 | tee mira-inst-mpi

NumPy 1.3.0 or later is recommended. Disutils does not work well on PPC architectures and a compiler must be explicitly specified. A build script for NumPy 1.3.0 is provided below:

#!/bin/sh
export CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc
export BASECFLAGS="-fno-strict-aliasing"
export LD_LIBRARY_PATH=/bgsys/drivers/ppcfloor/gnu-linux/lib64
export PYTHONHOME=/soft/apps/python/scalable-python-2.6.7-cnk-gcc
export PYTHON=${PYTHONHOME}/bin/python
# root=/soft/apps/python/scalable-python-2.6.7-cnk-gcc
buildir=build

rm -rf ${builddir}
# ${PYTHON} setup.py install --root="$root" 2>&1 | tee numpy-1.3.0.log.mira
${PYTHON} setup.py install 2>&1 | tee numpy-1.3.0.log.mira

GPAW will build with the XL legacy MPI wrapper script. It is recommeded that you statically link as many libraries as possible into GPAW to avoid potential performance bottlencks in loading shared libraries at scale. This can be done with some modification of the stock GPAW config.py file config_mira.py.

Lastly, we recommend that GPAW is compiled with both ScaLAPACK (v. 2.0.2 or later) as well as HDF5 support. Here is an example customization file:

define_macros += [('GPAW_NO_UNDERSCORE_BLAS', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_LAPACK', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_BLACS', '1')]
define_macros += [('GPAW_NO_UNDERSCORE_SCALAPACK', '1')]
define_macros += [('GPAW_ASYNC', 1)]
# define_macros += [('GPAW_MR3',1)] # requires developmental ScaLAPACK
# uncomment two lines below for FLOP rate measurement
# define_macros += [('GPAW_HPM',1)]
define_macros += [('GPAW_PERFORMANCE_REPORT',1)]
# define_macros += [('GPAW_MPI_DEBUG', 1)] # debugging
# define_macros += [('GPAW_OMP',1)] # not really working

scalapack = True
hdf5 = True

# If you are using threading, you probably
# need to change the following library:
# xlomp_ser -> xlsmp
#
# DO NOT INTERCHANGE THE ORDER OF LAPACK
# & ESSL, LAPACK SHOULD BE LINKED FIRST.
#
# Goto appears to be much faster for general
# DGEMM operations, particularly those with:
# alpha != 1.0 and beta != 0.0
#
# Goto is hand-tuned assembly, it will most
# likely always be faster than ESSL-4.x.
# NAR: Goto appears to cause core dumps for
# some problems, use at your own risk.
# Disabling the stackground seems to make
# the problem go away, but this is not 
# recommended.
# --env BG_STACKGUARDENABLE=0
#multi threaded
libraries = [
           'scalapack',
           'lapack',
           'esslsmpbg',
           'xlf90_r',
           'xlopt',
           'xl',
           'xlfmath',
           'xlsmp',
            ]

#single threaded
# libraries = [
#            'scalapack',
#            'lapack',
#            'esslbg',
#            'xlf90_r',
#            'xlopt',
#            'xl',
#            'xlfmath',
#            'xlomp_ser',
#             ]

import os
ibmcmp_base = os.environ['ibmcmp_base']
python_base = '/soft/apps/python/scalable-python-2.6.7-cnk-gcc'

library_dirs = [
           '/soft/libraries/alcf/current/xl/LAPACK/lib',
           '/soft/libraries/alcf/current/xl/SCALAPACK/lib',
           '/soft/libraries/essl/5.1.1-0/lib64',
           f'{ibmcmp_base}/xlf/bg/14.1/bglib64',
           f'{ibmcmp_base}/xlsmp/bg/3.1/bglib64',
# plain vanilla Python
#           '/bgsys/tools/Python-2.6/lib64',
# scalable Python 2.6.7
           f'{python_base}',
           '/soft/libraries/unsupported/hdf5-1.8.8/lib/',
           ]

# plain vanilla Python
# include_dirs += [
#    '/soft/apps/python/python-2.6.6-cnk-gcc/bgsys/tools/Python-2.6/lib64/python2.6/site-packages/numpy/core/include'
#    ]

# scalable Python 2.6.7
include_dirs += [
    f'{python_base}/lib/python2.6/site-packages/numpy/core/include',
    '/soft/libraries/unsupported/hdf5-1.8.8/include/'
    ]

mpi_libraries = [
#    'mpihpm_smp',
    'hdf5',
    'mpich',
    'opa',
    'mpl',
    'pami',
    'SPI',
    'SPI_cnk',
    'stdc++',
#    'bgpm',
    ]

mpi_library_dirs = [
    '/bgsys/drivers/ppcfloor/comm/xl.legacy/lib',
    '/bgsys/drivers/ppcfloor/comm/sys/lib',
    '/bgsys/drivers/ppcfloor/spi/lib',
    '/soft/perftools/hpctw',
    '/soft/perftools/bgpm/lib',
    ]

extra_link_args = ['-Wl,-export-dynamic']
compiler = "./bgq_xlc.py"
mpicompiler = "./bgq_xlc.py"
mpilinker = "./bgq_xlc_linker.py"

which requires a number of wrappers for the XL compilers (bgq_xlc.py and bgq_xlc_linker.py). A build script for GPAW is provided for convenience build_gpaw.sh.

After all Python modules are installed, they should be byte compiled before running GPAW. This can be accomplished by going to the top level directory for each Python library (Scalable Python, NumPy, ASE, and GPAW) an executing the command:

${python} -m compileall .

where ${python} is the explicit path to the Scalable Python interpreter.

Some users have noticed that the Python interpreter may unnecessarily re-compile Python modules. This is problematic at large (>10,000) number of MPI tasks and we recommend that users set the environment variable:

PYTHONDONTBYTECOMPILE=1

in the job submissions script.