Installed with ARM v19 compilers (flang and clang) and the ARM performance
library. This provides threaded BLAS, LAPACK and FFTW libraries. You will need to modify the make.inc file to make use of these.
Remember also to include the following flags:
```bash
-mcpu=native -armpl
```
Currently the OpenMP version (with ```-fopenmp```) flag does not compile. (Interanal compiler error).
##Execution
- Because of fairly long execution times and limited number of nodes only AUSURF has been tested.
- Run time error causes QE to crash just after final iteration. Probably due to FoX XMl library (an issue has been raised). Walltimes therefore estimated from time of final iteration.
See job files for example execution.
## Profiling
Use the ARM MAP profiler. See example job scripts.
# Quantum Espresso in the United European Applications Benchmark Suite (UEABS)
## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca.
## Introduction
Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
Full documentation is available from the project website [QuantumEspresso](https://www.quantum-espresso.org/).
In this README we give information relevant for its use in the UEABS.
### Standard CPU version
For the UEABS activity we have used mainly version v6.0 but later versions are now available.
### GPU version
The GPU port of Quantum Espresso is a version of the program which has been
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these
experiments is v6.0, even though further versions becamse available later during the
activity.
## Installation and requirements
### Standard
The Quantum Espresso source can be downloaded from the projects GitHub repository,[QE](https://github.com/QEF/q-e/tags). Requirements can be found from the website but you will need a good FORTRAN and C compiler with an MPI library and optionally (but highly recommended) an optimised linear algebra library.
### GPU version
For complete build requirements and information see the following GitHub site:
[QE-GPU](https://github.com/fspiga/qe-gpu)
A short summary is given below:
Essential
* The PGI compiler version 17.4 or above.
* You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100).
No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend
for their on-board memory capacity and double precision performance.
Optional
* A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If
none is available on your system then the installation can use a version supplied
Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores):
```shell
mpirun -n 16 pw-gpu.x -input pw.in
```
but check your system documentation since mpirun may be replaced by
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
allowed to run MPI programs interactively without using the
batch system.
### Parallelisation options
Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation
over the *k points* available in the input system. This is achieved with the ```-npool``` program option.
Thus for the AUSURF input which has 2 k points we can run:
```bash
srun -n 64 pw.x -npool 2 -input pw.in
```
which would allocate 32 MPI tasks per k-point.
The number of MPI tasks must be a multiple of the number of k-points. For the TA2O5 input, which has 26 k-points, we could try:
```bash
srun -n 52 pw.x -npool 26 -input pw.in
```
but we may wish to use fewer pools but with more tasks per pool:
```bash
srun -n 52 pw.x -npool 13 -input pw.in
```
It is also possible to control the number of MPI tasks used in the diagonalization of the
subspace Hamiltonian. This is possible with the ```-ndiag``` parameter which must be a square number.
For example with the AUSURF input with k-points we can assign 4 processes for the Hamiltonian diagonisation:
```bash
srun -n 64 pw.x -npool 2 -ndiag 4 -input pw.in
```
### Hints for running the GPU version
#### Memory limitations
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
## Execution
In the UEABS repository you will find a directory for each computer system tested, together with installation
instructions and job scripts.
In the following we describe in detail the execution procedure for the Marconi computer system.
### Execution on the Cineca Marconi KNL system
Quantum Espresso has already been installed for the KNL nodes of
Marconi and can be accessed via a specific module:
``` shell
module load profile/knl
module load autoload qe/6.0_knl
```
On Marconi the default is to use the MCDRAM as cache, and have the
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.
An example SLURM batch script for the A2 partition is given below: