Commit 8bc20f98 authored by Martti Louhivuori's avatar Martti Louhivuori
Browse files

Merge branch 'update-gpaw' into r2.1-dev

parents 70d4a138 6bd66b98
Instructions for obtaining GPAW and its test set for PRACE benchmarking
GPAW is licensed under GPL, so there are no license issues
NOTE: This benchmark uses 0.11 version of GPAW. For instructions for installing the
latest version, please visit:
https://wiki.fysik.dtu.dk/gpaw/install.html
Software requirements
=====================
* Python
* version 2.6-3.5 required
* this benchmark uses version 2.7.9
* NumPy
* this benchmark uses version 1.11.0
* ASE (Atomic Simulation Environment)
* this benchmark uses 3.9.0
* LibXC
* this benchmark uses version 2.0.1
* BLAS and LAPACK libraries
* this benchmark uses Intel MKL from Intel Composer Studio 2015
* MPI library (optional, for increased performance using parallel processes)
* this benchmark uses Intel MPI from Intel Composer Studio 2015
* FFTW (optional, for increased performance)
* this benchmark uses Intel MKL from Intel Composer Studio 2015
* BLACS and ScaLAPACK (optional, for increased performance)
* this benchmark uses Intel MKL from Intel Composer Studio 2015
* HDF5 (optional, library for parallel I/O and for saving files in HDF5 format)
* this benchmark uses 1.8.14
Obtaining the source code
=========================
* The specific version of GPAW used in this benchmark can be obtained from:
https://gitlab.com/gpaw/gpaw/tags/0.11.0
* Installation instructions can be found at:
https://wiki.fysik.dtu.dk/gpaw/install.html
* For platform specific instructions, please refer to:
https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html
Support
=======
* Help regarding the benchmark can be requested from adem.tekin@be.itu.edu.tr
This benchmark set contains scaling tests for electronic structure simulation software GPAW.
More information on GPAW can be found at https://wiki.fysik.dtu.dk/gpaw
Small Scaling Test: carbone_nanotube.py
=======================================
A ground state calculation for (6-6-10) carbon nanotube, requiring 30 SCF iterations.
The calculations under ScaLAPACK are parallelized under 4/4/64 partitioning scheme.
This systems scales reasonably up to 512 cores, running to completion under two minutes on a 2015 era x86 architecture cluster.
For scalability testing, the relevant timer in the text output 'out_nanotube_hXXX_kYYY_pZZZ' (where XXX denotes grid spacing, YYY denotes Brillouin-zone sampling and ZZZ denotes number of cores utilized) is 'Total Time'.
Medium Scaling Test: C60_Pb100.py and C60_Pb100_POSCAR
======================================================
A ground state calculation for Fullerene on Pb 100 Surface, requiring ~100 SCF iterations.
In this example, the parameters of the parallelization scheme for ScaLAPACK calculations are chosen automatically (using the keyword 'sl_auto: True').
This systems scales reasonably up to 1024 cores, running to completion under thirteen minutes on a 2015 era x86 architecture cluster.
For scalability testing, the relevant timer in the text output 'out_C60_Pb100_hXXX_kYYY_pZZZ' (where XXX denotes grid spacing, YYY denotes Brillouin-zone sampling and ZZZ denotes number of cores utilized) is 'Total Time'.
How to run
==========
* Download and build the source code along the instructions in GPAW_Build_README.txt
* Benchmarks do not need any special command line options and can be run
just as e.g. :
mpirun -np 256 gpaw-python carbone_nanotube.py
mpirun -np 512 gpaw-python C60_Pb100.py
PRACE Benchmarks for GPAW
=========================
GPAW
----
### Code description
[GPAW](https://wiki.fysik.dtu.dk/gpaw/) is a density-functional theory (DFT)
program for ab initio electronic structure calculations using the projector
augmented wave method. It uses a uniform real-space grid representation of the
electronic wavefunctions that allows for excellent computational scalability
and systematic converge properties.
GPAW is written mostly in Python, but includes also computational kernels
written in C as well as leveraging external libraries such as NumPy, BLAS and
ScaLAPACK. Parallelisation is based on message-passing using MPI with no
support for multithreading. Development branches for GPGPUs and MICs include
support for offloading to accelerators using either CUDA or pyMIC/libxsteam,
respectively.
### Download
GPAW is freely available under the GPL license. The source code can be
downloaded from the [Git repository](https://gitlab.com/gpaw/gpaw) or as
a tar package for each release from [PyPi](https://pypi.org/simple/gpaw/).
For example, to get version 1.4.0 using git:
```bash
git clone -b 1.4.0 https://gitlab.com/gpaw/gpaw.git
```
### Install
Generic [installation instructions](https://wiki.fysik.dtu.dk/gpaw/install.html)
and
[platform specific examples](https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html)
are provided in the [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/). For
accelerators, architecture specific instructions and requirements are also
provided for [Xeon Phis](build/build-xeon-phi.md) and for
[GPGPUs](build/build-cuda.md).
Example [build scripts](build/examples/) are also available for some PRACE
systems.
Benchmarks
----------
### Download
The benchmark set is available in the [benchmark/](benchmark/) directory or
alternatively, for download, either directly from the development
[Git repository](https://github.com/mlouhivu/gpaw-benchmarks/tree/prace)
or from the PRACE RI website (http://www.prace-ri.eu/ueabs/).
To download the benchmarks, use e.g. the following command:
```
git clone -b prace https://github.com/mlouhivu/gpaw-benchmarks
```
### Benchmark cases
#### Case S: Carbon nanotube
A ground state calculation for a carbon nanotube in vacuum. By default uses a
6-6-10 nanotube with 240 atoms (freely adjustable) and serial LAPACK with an
option to use ScaLAPACK. Expected to scale up to 10 nodes and/or 100 MPI
tasks.
Input file: [benchmark/carbon-nanotube/input.py](benchmark/carbon-nanotube/input.py)
#### Case M: Copper filament
A ground state calculation for a copper filament in vacuum. By default uses a
2x2x3 FCC lattice with 71 atoms (freely adjustable) and ScaLAPACK for
parallelisation. Expected to scale up to 100 nodes and/or 1000 MPI tasks.
Input file: [benchmark/carbon-nanotube/input.py](benchmark/copper-filament/input.py)
#### Case L: Silicon cluster
A ground state calculation for a silicon cluster in vacuum. By default the
cluster has a radius of 15Å (freely adjustable) and consists of 702 atoms,
and ScaLAPACK is used for parallelisation. Expected to scale up to 1000 nodes
and/or 10000 MPI tasks.
Input file: [benchmark/carbon-nanotube/input.py](benchmark/silicon-cluster/input.py)
### Running the benchmarks
No special command line options or environment variables are needed to run the
benchmarks on most systems. One can simply say e.g.
```
srun gpaw-python input.py
```
#### Special case: KNC
For KNCs (Xeon Phi Knights Corner), one needs to use a wrapper script to set
correct affinities for pyMIC (see
[scripts/affinity-wrapper.sh](scripts/affinity-wrapper.sh) for an example)
and to set two environment variables for GPAW:
```shell
GPAW_OFFLOAD=1 # (to turn on offloading)
GPAW_PPN=<no. of MPI tasks per node>
```
For example, in a SLURM system, this could be:
```shell
GPAW_PPN=12 GPAW_OFFLOAD=1 mpirun -np 256 -bootstrap slurm \
./affinity-wrapper.sh 12 gpaw-python input.py
```
#### Examples
Example [job scripts](scripts/) (`scripts/job-*.sh`) are provided for
different PRACE systems that may offer a helpful starting point.
# GPAW
GPAW is a density-functional theory (DFT) program for ab initio electronic
structure calculations using the projector augmented wave method. It is
written mostly in Python and uses MPI for parallelisation.
## Build instructions for PRACE Accelerator Benchmark for GPAW
GPAW is licensed under GPL and is freely available at:
https://wiki.fysik.dtu.dk/gpaw/
https://gitlab.com/gpaw/gpaw
Generic installation instructions can be found at:
https://wiki.fysik.dtu.dk/gpaw/install.html
For platform specific examples, please refer to:
https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html
Accelerator specific instructions and requirements are given in more detail
below for each architecture.
### GPGPUs
GPAW has a separate CUDA version available for Nvidia GPGPUs. Nvidia has
released multiple versions of the CUDA toolkit. In this work just CUDA 7.5 was
tested with Tesla K20, Tesla K40 and Tesla K80 cards.
Source code is available in GPAW's repository as a separate branch called
'cuda'. To obtain the code, use e.g. the following commands:
git clone https://gitlab.com/gpaw/gpaw.git
cd gpaw
git checkout cuda
or download it from: https://gitlab.com/gpaw/gpaw/tree/cuda
Alternatively, the source code is also available with minor modifications
required to work with newer versions of CUDA and Libxc (incl. example
installation settings) at:
https://gitlab.com/atekin/gpaw-cuda.git
Patches needed to work with newer versions of CUDA and example setup scripts
for GPAW using dynamic links to Libxc are available also separately at:
https://github.com/mlouhivu/gpaw-cuda-patches.git
#### Software requirements
CUDA branch of the GPAW is based on version 0.9.1.13528 and has similar
software requirements to the main branch. To compile the code, one needs to
use Intel compile environment.
For example, the following versions are known to work:
* Intel compile environment with Intel MKL and Intel MPI (2015 update 3)
* Python (2.7.11)
* ASE (3.9.1)
* Libxc (3.0.0)
* CUDA (7.5)
* HDF5 (1.8.16)
* Pycuda (2016.1.2)
#### Install instructions
Before installing the CUDA version of GPAW, the required packages should be
compiled using Intel compilers. In addition to using Intel compilers, there
are two additional steps compared to a standard installation:
1. Compile the CUDA files after preparing a suitable make.inc (in c/cuda/) by
modifying the default options and paths to match your system. The following
compiler options may offer a good starting point.
```shell
CC = icc
CCFLAGS = $(CUGPAW_DEFS) -fPIC -std=c99 -m64 -O3
NVCC = nvcc -ccbin=icpc
NVCCFLAGS = $(CUGPAW_DEFS) -O3 -arch=sm_20 -m64 --compiler-options '-fPIC -O3'
```
To use a dynamic link to Libxc, please add a corresponding include flag to the
CUGPAW_INCLUDES (e.g. `-I/path/to/libxc/include`) and a
It is possible that you may also need to add additional include flags for MKL
and Libxc in CUGPAW_INCLUDES (e.g. `-I/path/to/mkl/include`).
After making the necessary changes, simply run make (in the c/cuda path).
2. Edit your GPAW setup script (customize.py) to add correct link and compile
options for CUDA. The relevant lines are e.g.:
```python
define_macros += [('GPAW_CUDA', '1')]
libraries += [
'gpaw-cuda',
'cublas',
'cudart',
'stdc++'
]
library_dirs += [
'./c/cuda',
'/path/to/cuda/lib64'
]
include_dirs += [
'/path/to/cuda/include'
]
```
## Xeon Phi MICs
Intel's MIC architecture has currently two distinct generations of processors:
1st generation Knights Corner (KNC) and 2nd generation Knights Landing (KNL).
KNCs require a specific offload version of GPAW, whereas KNLs use standard
GPAW.
### KNC (Knights Corner)
For KNCs, GPAW has adopted an offload-to-the-MIC-co-processor approach similar
to GPGPUs. The offload version of GPAW uses the stream-based offload module
pyMIC (https://github.com/01org/pyMIC) to offload computationally intensive
matrix calculations to the MIC co-processors.
Source code is available in GPAW's repository as a separate branch called
'mic'. To obtain the code, use e.g. the following commands:
git clone https://gitlab.com/gpaw/gpaw.git
cd gpaw
git checkout mic
or download it from: https://gitlab.com/gpaw/gpaw/tree/mic
A ready-to-use install package with examples and instructions is also
available at:
https://github.com/mlouhivu/gpaw-mic-install-pack.git
#### Software requirements
The offload version of GPAW is roughly equivalent to the 0.11.0 version of
GPAW and thus has similar requirements (for software and versions).
For example, the following versions are known to work:
* Python (2.7.x)
* ASE (3.9.1)
* NumPy (1.9.2)
* Libxc (2.1.x)
In addition, pyMIC requires:
* Intel compile environment with Intel MKL and Intel MPI
* Intel MPSS (Manycore Platform Software Stack)
#### Install instructions
In addition to using Intel compilers, there are three additional steps apart
from standard installation:
1. Compile and install Numpy with a suitable site.cfg to use MKL, e.g.
```python
[mkl]
library_dirs = /path/to/mkl/lib/intel64
include_dirs = /path/to/mkl/include
lapack_libs =
mkl_libs = mkl_rt
```
2. Compile and install pyMIC before GPAW.
3. Edit your GPAW setup script (customize.py) to add correct link and compile
options for offloading. The relevant lines are e.g.:
```python
# offload to KNC
extra_compile_args += ['-qoffload-option,mic,compiler,"-qopenmp"']
extra_compile_args += ['-qopt-report-phase=offload']
# linker settings for MKL on KNC
mic_mkl_lib = '/path/to/mkl/lib/mic/'
extra_link_args += ['-offload-option,mic,link,"-L' + mic_mkl_lib \
+ ' -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread"']
```
### KNL (Knights Landing)
For KNLs, one can use the standard version of GPAW, instead of the offload
version used for KNCs. Please refer to the generic installation instructions
for GPAW.
#### Software requirements
https://wiki.fysik.dtu.dk/gpaw/install.html
#### Install instructions
https://wiki.fysik.dtu.dk/gpaw/install.html
https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html
It is advisable to use Intel compile environment with Intel MKL and Intel MPI
to take advantage of their KNL optimisations. To enable the AVX512 vector
sets supported by KNLs, one needs to use the compiler option '-xMIC-AVX512'
when installing GPAW.
To improve performance, one may also link to Intel TBB to benefit from an
optimised memory allocator (tbbmalloc). This can be done during installation
or at run-time by setting environment variable LD_PRELOAD to point to the
correct libraries, i.e. for example:
```shell
export LD_PRELOAD=$TBBROOT/lib/intel64/gcc4.7/libtbbmalloc_proxy.so.2
export LD_PRELOAD=$LD_PRELOAD:$TBBROOT/lib/intel64/gcc4.7/libtbbmalloc.so.2
```
It may also be beneficial to use hugepages together with tbbmalloc
(export TBB_MALLOC_USE_HUGE_PAGES=1`).
## Run instructions for PRACE Accelerator Benchmark for GPAW
### Download benchmark
The benchmark set is available at:
https://github.com/mlouhivu/gpaw-benchmarks/tree/prace
or at the PRACE RI website (http://www.prace-ri.eu/ueabs/).
### Small case: Carbon nanotube
A ground state calculation for a carbon nanotube in vacuum. By default uses a
6-6-10 nanotube with 240 atoms (freely adjustable) and serial LAPACK with an
option to use ScaLAPACK.
This benchmark is aimed at smaller systems, with an intended scaling range of
up to 10 nodes.
Input file: carbon-nanotube/input.py
### Large case: Copper filament
A ground state calculation for a copper filament in vacuum. By default uses a
2x2x3 FCC lattice with 71 atoms (freely adjustable) and ScaLAPACK for
parallelisation.
This benchmark is aimed at larger systems, with an intended scaling range of
up to 100 nodes.
Input file: copper-filament/input.py
### Running benchmarks
No special command line options or environment variables are needed to run the
benchmarks on GPGPUs or KNL (Xeon Phi Knights Landing) MICs. One can simply
say e.g.
`mpirun -np 256 gpaw-python input.py`
For KNCs (Xeon Phi Knights Corner), one needs to use a wrapper script to set
correct affinities for pyMIC (see setup/affinity-wrapper.sh for an example)
and to set two environment variables for GPAW:
```shel
GPAW_OFFLOAD=1 (to turn on offloading)
GPAW_PPN=<no. of MPI tasks per node>
```
For example, in a SLURM system, this could be:
```shell
GPAW_PPN=12 GPAW_OFFLOAD=1 mpirun -np 256 -bootstrap slurm \
./affinity-wrapper.sh gpaw-python input.py
```
Example job scripts (setup/job-*.sh) for different accelerator architectures
are provided together with related machine specifications (setup/specs.*) that
may offer a helpful starting point (especially for KNCs).
###
### GPAW benchmark: Carbon Nanotube
###
from __future__ import print_function
from gpaw.mpi import size, rank
from gpaw import GPAW, Mixer, PoissonSolver, ConvergenceError
from gpaw.occupations import FermiDirac
try:
from ase.build import nanotube
except ImportError:
from ase.structure import nanotube
try:
from gpaw import use_mic
except ImportError:
use_mic = False
try:
from gpaw import use_cuda
use_cuda = True
except ImportError:
use_cuda = False
use_cpu = not (use_mic or use_cuda)
# dimensions of the nanotube
n = 6
m = 6
length = 10
# other parameters
txt = 'output.txt'
maxiter = 16
conv = {'eigenstates' : 1e-4, 'density' : 1e-2, 'energy' : 1e-3}
# uncomment to use ScaLAPACK
#parallel = {'sl_auto': True}
# output benchmark parameters
if rank == 0:
print("#"*60)
print("GPAW benchmark: Carbon Nanotube")
print(" nanotube dimensions: n=%d, m=%d, length=%d" % (n, m, length))
print(" MPI tasks: %d" % size)
print(" using CUDA (GPGPU): " + str(use_cuda))
print(" using pyMIC (KNC) : " + str(use_mic))
print(" using CPU (or KNL): " + str(use_cpu))
print("#"*60)
print("")
# setup parameters
args = {'h': 0.2,
'nbands': -60,
'occupations': FermiDirac(0.1),
'mixer': Mixer(0.1, 5, 50),
'poissonsolver': PoissonSolver(eps=1e-12),
'eigensolver': 'rmm-diis',
'maxiter': maxiter,
'convergence': conv,
'txt': txt}
if use_cuda:
args['cuda'] = True
try:
args['parallel'] = parallel
except: pass
# setup the system
atoms = nanotube(n, m, length)
atoms.center(vacuum=4.068, axis=0)
atoms.center(vacuum=4.068, axis=1)
calc = GPAW(**args)
atoms.set_calculator(calc)
# execute the run
try:
atoms.get_potential_energy()
except ConvergenceError:
pass
###
### GPAW benchmark: Copper Filament
###
from __future__ import print_function
from gpaw.mpi import size, rank
from gpaw import GPAW, Mixer, ConvergenceError
from gpaw.occupations import FermiDirac
from ase.lattice.cubic import FaceCenteredCubic
try:
from gpaw.eigensolvers.rmm_diis import RMM_DIIS
except ImportError:
from gpaw.eigensolvers.rmmdiis import RMMDIIS as RMM_DIIS
try:
from gpaw import use_mic
except ImportError:
use_mic = False
try:
from gpaw import use_cuda
use_cuda = True
except ImportError:
use_cuda = False
use_cpu = not (use_mic or use_cuda)
# no. of replicates in each dimension (increase to scale up the system)
x = 3
y = 2
z = 4
# other parameters
h = 0.22
kpts = (1,1,8)
txt = 'output.txt'
maxiter = 24
parallel = {'sl_default': (2,2,64)}
# output benchmark parameters
if rank == 0:
print("#"*60)
print("GPAW benchmark: Copper Filament")
print(" dimensions: x=%d, y=%d, z=%d" % (x, y, z))
print(" grid spacing: h=%f" % h)
print(" Brillouin-zone sampling: kpts=" + str(kpts))
print(" MPI tasks: %d" % size)
print(" using CUDA (GPGPU): " + str(use_cuda))
print(" using pyMIC (KNC) : " + str(use_mic))
print(" using CPU (or KNL): " + str(use_cpu))
print("#"*60)
print("")
# compatibility hack for the eigensolver
rmm = RMM_DIIS()
rmm.niter = 2
# setup parameters
args = {'h': h,
'nbands': -20,
'occupations': FermiDirac(0.2),
'kpts': kpts,
'xc': 'PBE',
'mixer': Mixer(0.1, 5, 100),
'eigensolver': rmm,
'maxiter': maxiter,
'parallel': parallel,
'txt': txt}
if use_cuda:
args['cuda'] = True
# setup the system
atoms = FaceCenteredCubic(directions=[[1,-1,0], [1,1,-2], [1,1,1]],
size=(x,y,z), symbol='Cu', pbc=(0,0,1))
atoms.center(vacuum=6.0, axis=0