README_ACC.md

# GPAW

GPAW is a density-functional theory (DFT) program for ab initio electronic
structure calculations using the projector augmented wave method. It is
written mostly in Python and uses MPI for parallelisation.

## Build instructions for PRACE Accelerator Benchmark for GPAW

GPAW is licensed under GPL and is freely available at:
  https://wiki.fysik.dtu.dk/gpaw/
  https://gitlab.com/gpaw/gpaw

Generic installation instructions can be found at:
  https://wiki.fysik.dtu.dk/gpaw/install.html

For platform specific examples, please refer to:
  https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html

Accelerator specific instructions and requirements are given in more detail
below for each architecture.


### GPGPUs

GPAW has a separate CUDA version available for Nvidia GPGPUs. Nvidia has
released multiple versions of the CUDA toolkit. In this work just CUDA 7.5 was
tested with Tesla K20, Tesla K40 and Tesla K80 cards.

Source code is available in GPAW's repository as a separate branch called
'cuda'. To obtain the code, use e.g. the following commands:
  git clone https://gitlab.com/gpaw/gpaw.git
  cd gpaw
  git checkout cuda

or download it from: https://gitlab.com/gpaw/gpaw/tree/cuda

Alternatively, the source code is also available with minor modifications
required to work with newer versions of CUDA and Libxc (incl. example
installation settings) at:
  https://gitlab.com/atekin/gpaw-cuda.git

Patches needed to work with newer versions of CUDA and example setup scripts
for GPAW using dynamic links to Libxc are available also separately at:
  https://github.com/mlouhivu/gpaw-cuda-patches.git


#### Software requirements

CUDA branch of the GPAW is based on version 0.9.1.13528 and has similar
software requirements to the main branch. To compile the code, one needs to
use Intel compile environment.

For example, the following versions are known to work:
* Intel compile environment with Intel MKL and Intel MPI (2015 update 3)
* Python (2.7.11)
* ASE (3.9.1)
* Libxc (3.0.0)
* CUDA (7.5)
* HDF5 (1.8.16)
* Pycuda (2016.1.2)


#### Install instructions


Before installing the CUDA version of GPAW, the required packages should be
compiled using Intel compilers. In addition to using Intel compilers, there
are two additional steps compared to a standard installation:

1. Compile the CUDA files after preparing a suitable make.inc (in c/cuda/) by
   modifying the default options and paths to match your system. The following
   compiler options may offer a good starting point.

    ```shell
    CC        = icc
    CCFLAGS   = $(CUGPAW_DEFS) -fPIC -std=c99 -m64 -O3
    NVCC      = nvcc -ccbin=icpc
    NVCCFLAGS = $(CUGPAW_DEFS) -O3 -arch=sm_20 -m64 --compiler-options '-fPIC -O3'
    ```

    To use a dynamic link to Libxc, please add a corresponding include flag to the
    CUGPAW_INCLUDES (e.g. `-I/path/to/libxc/include`) and a

    It is possible that you may also need to add additional include flags for MKL
    and Libxc in CUGPAW_INCLUDES (e.g. `-I/path/to/mkl/include`).

    After making the necessary changes, simply run make (in the c/cuda path).

2. Edit your GPAW setup script (customize.py) to add correct link and compile
   options for CUDA. The relevant lines are e.g.:

    ```python
    define_macros += [('GPAW_CUDA', '1')]
    libraries += [
            'gpaw-cuda',
            'cublas',
            'cudart',
            'stdc++'
    ]
    library_dirs += [
            './c/cuda',
            '/path/to/cuda/lib64'
    ]
    include_dirs += [
            '/path/to/cuda/include'
    ]

    ```


## Xeon Phi MICs

Intel's MIC architecture has currently two distinct generations of processors:
1st generation Knights Corner (KNC) and 2nd generation Knights Landing (KNL).
KNCs require a specific offload version of GPAW, whereas KNLs use standard
GPAW.


### KNC (Knights Corner)

For KNCs, GPAW has adopted an offload-to-the-MIC-co-processor approach similar
to GPGPUs. The offload version of GPAW uses the stream-based offload module
pyMIC (https://github.com/01org/pyMIC) to offload computationally intensive
matrix calculations to the MIC co-processors.

Source code is available in GPAW's repository as a separate branch called
'mic'. To obtain the code, use e.g. the following commands:
  git clone https://gitlab.com/gpaw/gpaw.git
  cd gpaw
  git checkout mic

or download it from: https://gitlab.com/gpaw/gpaw/tree/mic

A ready-to-use install package with examples and instructions is also
available at:
  https://github.com/mlouhivu/gpaw-mic-install-pack.git


#### Software requirements


The offload version of GPAW is roughly equivalent to the 0.11.0 version of
GPAW and thus has similar requirements (for software and versions).

For example, the following versions are known to work:
* Python (2.7.x)
* ASE (3.9.1)
* NumPy (1.9.2)
* Libxc (2.1.x)

In addition, pyMIC requires:
* Intel compile environment with Intel MKL and Intel MPI
* Intel MPSS (Manycore Platform Software Stack)

#### Install instructions


In addition to using Intel compilers, there are three additional steps apart
from standard installation:

1. Compile and install Numpy with a suitable site.cfg to use MKL, e.g.

    ```python
    [mkl]
    library_dirs = /path/to/mkl/lib/intel64
    include_dirs = /path/to/mkl/include
    lapack_libs =
    mkl_libs = mkl_rt
    ```

2. Compile and install pyMIC before GPAW.

3. Edit your GPAW setup script (customize.py) to add correct link and compile
   options for offloading. The relevant lines are e.g.:

    ```python
    # offload to KNC
    extra_compile_args += ['-qoffload-option,mic,compiler,"-qopenmp"']
    extra_compile_args += ['-qopt-report-phase=offload']

    # linker settings for MKL on KNC
    mic_mkl_lib = '/path/to/mkl/lib/mic/'
    extra_link_args += ['-offload-option,mic,link,"-L' + mic_mkl_lib \
            + ' -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread"']
    ```


### KNL (Knights Landing)


For KNLs, one can use the standard version of GPAW, instead of the offload
version used for KNCs. Please refer to the generic installation instructions
for GPAW.

#### Software requirements

  https://wiki.fysik.dtu.dk/gpaw/install.html

#### Install instructions

  https://wiki.fysik.dtu.dk/gpaw/install.html
  https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html

It is advisable to use Intel compile environment with Intel MKL and Intel MPI
to take advantage of their KNL optimisations. To enable the AVX512 vector
sets supported by KNLs, one needs to use the compiler option '-xMIC-AVX512'
when installing GPAW.

To improve performance, one may also link to Intel TBB to benefit from an
optimised memory allocator (tbbmalloc). This can be done during installation
or at run-time by setting environment variable LD_PRELOAD to point to the
correct libraries, i.e. for example:

```shell
export LD_PRELOAD=$TBBROOT/lib/intel64/gcc4.7/libtbbmalloc_proxy.so.2
export LD_PRELOAD=$LD_PRELOAD:$TBBROOT/lib/intel64/gcc4.7/libtbbmalloc.so.2
```

It may also be beneficial to use hugepages together with tbbmalloc
(export TBB_MALLOC_USE_HUGE_PAGES=1`).


## Run instructions for PRACE Accelerator Benchmark for GPAW

### Download benchmark

The benchmark set is available at:
  https://github.com/mlouhivu/gpaw-benchmarks/tree/prace

or at the PRACE RI website (http://www.prace-ri.eu/ueabs/).


### Small case: Carbon nanotube

A ground state calculation for a carbon nanotube in vacuum. By default uses a
6-6-10 nanotube with 240 atoms (freely adjustable) and serial LAPACK with an
option to use ScaLAPACK.

This benchmark is aimed at smaller systems, with an intended scaling range of
up to 10 nodes.

Input file: carbon-nanotube/input.py


### Large case: Copper filament

A ground state calculation for a copper filament in vacuum. By default uses a
2x2x3 FCC lattice with 71 atoms (freely adjustable) and ScaLAPACK for
parallelisation.

This benchmark is aimed at larger systems, with an intended scaling range of
up to 100 nodes.

Input file: copper-filament/input.py


### Running benchmarks

No special command line options or environment variables are needed to run the
benchmarks on GPGPUs or KNL (Xeon Phi Knights Landing) MICs. One can simply
say e.g.
  `mpirun -np 256 gpaw-python input.py`


For KNCs (Xeon Phi Knights Corner), one needs to use a wrapper script to set
correct affinities for pyMIC (see setup/affinity-wrapper.sh for an example)
and to set two environment variables for GPAW:
```shel
GPAW_OFFLOAD=1  (to turn on offloading)
GPAW_PPN=<no. of MPI tasks per node>
```

For example, in a SLURM system, this could be:
```shell
GPAW_PPN=12 GPAW_OFFLOAD=1 mpirun -np 256 -bootstrap slurm \
    ./affinity-wrapper.sh gpaw-python input.py
```

Example job scripts (setup/job-*.sh) for different accelerator architectures
are provided together with related machine specifications (setup/specs.*) that
may offer a helpful starting point (especially for KNCs).