# GPAW - A Projected Augmented Wave code ## Summary version 0.1 ## Purpose of the benchmark [GPAW](https://wiki.fysik.dtu.dk/gpaw/) is a density-functional theory (DFT) program for ab initio electronic structure calculations using the projector augmented wave method. It uses a uniform real-space grid representation of the electronic wavefunctions that allows for excellent computational scalability and systematic converge properties. The GPAW benchmark tests MPI parallelization and the quality of the provided mathematical libraries, including BLAS, LAPACK, ScaLAPACK, and FFTW-compatible library. There is also a CUDA-based implementation for GPU systems. ## Characteristics of the benchmark GPAW is written mostly in Python, but includes also computational kernels written in C as well as leveraging external libraries such as NumPy, BLAS and ScaLAPACK. Parallelisation is based on message-passing using MPI with no support for multithreading. There have been various developments for GPGPUs and MICs in the past using either CUDA or pyMIC/libxstream. Many of those branches see no development anymore. The relevant CUDA version for this benchmark is available in a [separate GitLab for CUDA development, cuda branch](https://gitlab.com/mlouhivu/gpaw/tree/cuda). This version corresponds to the Aalto version mentioned on the [GPU page of the GPAW Wiki](https://wiki.fysik.dtu.dk/gpaw/devel/projects/gpu.html). As of early 2020, that version seems to be derived from the 1.5.2 CPU version (at least, I could find a commit that claims to merge the 1.5.2 code). There is currently no active support for non-CUDA accelerator platforms. For the UEABS benchmark version 2.2, the following versions of GPAW were tested: * CPU-based: * Version 1.5.2 as this one is the last of the 1.5 branch and since the GPU version is derived from this version. * Version 20.1.0, the most recent version during the development of the UEABS 2.2 benchmark suite. * GPU-based: There is no official release or version number. The UEABS 2.2 benchmark suite was tested using commit TODO of [the cuda branch of the GitLab for CUDA development](https://gitlab.com/mlouhivu/gpaw/tree/cuda). There are three benchmark cases, denotes S, M and L. ### Case S: Carbon nanotube A ground state calculation for a carbon nanotube in vacuum. By default uses a 6-6-10 nanotube with 240 atoms (freely adjustable) and serial LAPACK with an option to use ScaLAPACK. Expected to scale up to 10 nodes and/or 100 MPI tasks. This benchmark runs fast. Expect execution times around 1 minutes on 100 cores of a modern x86 cluster. Input file: [benchmark/1_S_carbon-nanotube/input.py](benchmark/1_S_carbon-nanotube/input.py) ### Case M: Copper filament A ground state calculation for a copper filament in vacuum. By default uses a 3x4x4 FCC lattice with 71 atoms (freely adjustable through the variables `x`, `y` and `z` in the input file) and ScaLAPACK for parallellisation. Expected to scale up to 100 nodes and/or 1000 MPI tasks. Input file: [benchmark/2_M_copper-filament/input.py](benchmark/2_M_copper-filament/input.py) The benchmark was tested using 1000 and 1024 cores. For some core configurations, one may get error messages similar to ``gpaw.grid_descriptor.BadGridError: Grid ... to small for ... cores``. If one really wants to run the benchmark for those number of cores, one needs to adapt the values of `x`, `y` and `z` in `input.py`. However, this changes the benchmark so results cannot be compared easily with benchmark runs for different values of these variables. ### Case L: Silicon cluster A ground state calculation for a silicon cluster in vacuum. By default the cluster has a radius of 15Å (freely adjustable) and consists of 702 atoms, and ScaLAPACK is used for parallelisation. Expected to scale up to 1000 nodes and/or 10000 MPI tasks. Input file: [benchmark/3_L_silicon-cluster/input.py](benchmark/3_L_silicon-cluster/input.py) ## Mechanics of building the benchmark Note that GPAW version numbering changed in 2019. Version 1.5.3 is the last version with the old numbering. In 2019 the development team switched to a version numbering scheme based on year, month and patchlevel, e.g., 19.8.1 for the second version released in August 2019. Another change is in the Python packages used to install GPAW. Versions up to and including 19.8.1 use the `distutils` package while versions 20.1.0 and later are based on `setuptools`. This does affect the installation process. GPAW for a while supports two different ways to run in parallel distributed memory mode: * Using a wrapper executable `gpaw-python` that replaces the Python interpreter (it internally links to the libpython library) and that provides the MPI functionality. * Using the standard Python interpreter, including the MPI functionality in the `_gpaw.so` shared library. In the `distutils`-based versions, the wrapper script approach is the default behaviour, while in the `setuptools`-based versions, the approach using the standard Python interpreter is the preferred one in the manual. Even though the code in the `setuptools`-based versions still includes the option to use the wrapper script approach, it does not work in the tested version 20.1.0. ### Available instructions The [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/) only contains the [installation instructions](https://wiki.fysik.dtu.dk/gpaw/index.html) for the current version. For the installation instructions with a list of dependencies for older versions, download the code (see below) and look for the file `doc/install.rst` or go to the [GPAW GitLab](https://gitlab.com/gpaw), select the tag for the desired version and view the file `doc/install.rst`. The [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/) also provides some [platform specific examples](https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html). ### List of dependencies GPAW is Python code but it also contains some C code for some performance-critical parts and to interface to a number of libraries on which it depends. Hence GPAW has the following requirements: * C compiler with MPI support * BLAS, LAPACK, BLACS and ScaLAPACK. ScaLAPACK is optional for GPAW, but mandatory for the UEABS benchmarks. It is used by the medium and large cases and optional for the small case. * Python. GPAW 1.5.2 requires Python 2.7 or 3.4-3.7, GPAW 19.8.1 requires 3.4-3.7, GPAW 20.1.0 Python 3.5-3.8 and GPAW 20.10.0 Python 3.6-3.9. * Mandatory Python packages: * [NumPY](https://pypi.org/project/numpy/) 1.9 or later (for GPAW 1.5.2/19.8.1/20.1.0/20.10.0) * [SciPy](https://pypi.org/project/scipy/) 0.14 or later (for GPAW 1.5.2/19.8.1/20.1.0/20.10.0) * [FFTW](http://www.fftw.org) is highly recommended. As long as the optional libvdwxc component is not used, the MKL FFTW wrappers can also be used. Recent versions of GPAW also show good performance using just the NumPy-provided FFT routines provided that NumPy has been built with a highly optimized FFT library. * [LibXC](https://www.tddft.org/programs/libxc/) 2.X or newer for GPAW 1.5.2, 3.X or 4.X for GPAW 19.8.1, 20.1.0 and 20.10.0. LibXC is a library of exchange-correlation functions for density-functional theory. None of the versions currently mentions LibXC 5.X as officially supported. * [ASE, Atomic Simulation Environment](https://wiki.fysik.dtu.dk/ase/), a Python package from the same group that develops GPAW * Check the release notes of GPAW as the releases of ASE and GPAW should match. E.g., during the development of the UEABS version 2.2 benchamark suite, version 20.1.0 was the most up-to-date release of GPAW with 3.19.1 the matching ASE version (though 3.18.0 should also work). * ASE has some optional dependencies that are not needed for the benchmarking: Matplotlib (2.0.0 or newer), tkinter (Tk interface, part of the Standard Python Library) and Flask. * Optional components of GPAW that are not used by the UEABS benchmarks: * [libvdwxc](https://gitlab.com/libvdwxc/libvdwxc), a portable C library of density functionals with van der Waals interactions for density functional theory. This library does not work with the MKL FFTW wrappers. * [ELPA](https://elpa.mpcdf.mpg.de/), which should improve performance for large systems when GPAW is used in [LCAO mode](https://wiki.fysik.dtu.dk/gpaw/documentation/lcao/lcao.html) In addition, the GPU version needs: * NVIDIA CUDA toolkit * [PyCUDA](https://pypi.org/project/pycuda/) Installing GPAW also requires a number of standard build tools on the system, including * [GNU autoconf](https://www.gnu.org/software/autoconf/) is needed to generate the configure script for libxc * [GNU Libtool](https://www.gnu.org/software/libtool/) is needed. If not found, the configure process of libxc produces very misleading error messages that do not immediately point to libtool missing. * [GNU make](https://www.gnu.org/software/make/) ### Download of GPAW GPAW is freely available under the GPL license. The source code of the CPU version can be downloaded from the [GitLab repository](https://gitlab.com/gpaw/gpaw) or as a tar package for each release from [PyPi](https://pypi.org/simple/gpaw/). For example, to get version 20.1.0 using git: ```bash git clone -b 20.1.0 https://gitlab.com/gpaw/gpaw.git ``` The CUDA development version is available in [the cuda branch of a separate GitLab](https://gitlab.com/mlouhivu/gpaw/tree/cuda). To get the current development version using git: ```bash git clone -b cuda https://gitlab.com/mlouhivu/gpaw.git ``` ### Install Crucial for the configuration of GPAW is a proper `customize.py` (GPAW 19.8.1 and earlier) or `siteconfig.py` (GPAW 20.1.0 and later) file. The defaults used by GPAW may not offer optimal performance and the automatic detection of the libraries also fails on some systems. The UEABS repository contains additional instructions: * [general instructions](build/build-cpu.md) * [GPGPUs](build/build-cuda.md) - To check Example [build scripts](build/examples/) are also available for some PRACE and non-PRACE systems. ## Mechanics of Running the Benchmark ### Download of the benchmark sets As each benchmark has only a single input file, these can be downloaded right from this repository. 1. [Testcase S: Carbon nanotube input file](benchmark/1_S_carbon-nanotube/input.py) 2. [Testcase M: Copper filament input file](benchmark/2_M_copper-filament/input.py) 3. [Testcase L: Silicon cluster input file](benchmark/3_L_silicon-cluster/input.py) ### Running the benchmarks #### Using the `gpaw-python` wrapper script This is the default approach for versions up to and including 19.8.1 of GPAW These versions of GPAW come with their own wrapper executable, `gpaw-python`, to start a MPI-based GPAW run. No special command line options or environment variables are needed to run the benchmarks if your MPI process starter (`mpirun`, Slurm `srun`, ...) communicates properly with the resource manager. E.g., on Slurm systems, use ``` srun gpaw-python input.py ``` #### Using the regular Python interpreter and parallel GPAW shared library This is the default method for GPAW 20.1.0 (and likely later). The wrapper executable `gpaw-python` is no longer available in the default parallel build of GPAW. There are now two different ways to start GPAW. One way is through `mpirun`, `srun` or an equivalent process starter and the `gpaw python` command: ``` srun gpaw python input.py ``` The second way is by simply using the `-P` flag of the `gpaw` command and let it use a process starter internally: ``` gpaw -P 100 python input.py ``` will run on 100 cores. There is a third but non-recommended option: ``` srun python3 input.py ``` That option however doesn't do the imports in the same way that the `gpaw` script would do. ### Examples Example [job scripts](scripts/) (`scripts/job-*.sh`) are provided for different PRACE systems that may offer a helpful starting point. *TODO: Update the examples as testing on other systems goes on.* ## Verification of Results ### Case S: Carbon nanotube TODO. ### Case M: Copper filament TODO. Convergence problems. ### Case L: Silicon cluster TODO. Get the medium case to run before spending time on the large one.