Skip to content 7.63 KiB
Newer Older
Andrew Emerson's avatar
Andrew Emerson committed
# Quantum Espresso in the Accelerated Benchmark Suite
## Document Author: A. Emerson ( , Cineca.
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
## Contents

Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
7.	References	

Andrew Emerson's avatar
Andrew Emerson committed
## 1. Introduction
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. 

### Standard CPU version
For the UEABS activity we have used mainly version v6.0 but later versions are now available.

Andrew Emerson's avatar
Andrew Emerson committed
### GPU version
Andrew Emerson's avatar
Andrew Emerson committed
The GPU port of Quantum Espresso is a version of the program which has been 
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these
experiments is v6.0, even though further versions becamse available later during the 
Andrew Emerson's avatar
Andrew Emerson committed
## 2. Installation and requirements

### Standard
The Quantum Espresso source can be downloaded from the projects GitHub repository,[QE]( Requirements can be found from the website but you will need a good FORTRAN and C compiler with an MPI library and optionally (but highly recommended) an optimised linear algebra library.
Andrew Emerson's avatar
Andrew Emerson committed
### GPU version
Andrew Emerson's avatar
Andrew Emerson committed
For complete build requirements and information see the following GitHub site:
A short summary is given below:
Andrew Emerson's avatar
Andrew Emerson committed


Andrew Emerson's avatar
Andrew Emerson committed
 * The PGI compiler version 17.4 or above.
 * You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100).
   No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend
   for their on-board memory capacity and double precision performance.
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
* A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If
  none is available  on your system then the installation can use a version supplied
  with the distribution.  
Andrew Emerson's avatar
Andrew Emerson committed
##3. Downloading the software

### Standard
From the website, for example:
Andrew Emerson's avatar
Andrew Emerson committed
### GPU
Available from the web site given above. You can use, for example, ```git clone```
Andrew Emerson's avatar
Andrew Emerson committed
to download the software:
git clone
Andrew Emerson's avatar
Andrew Emerson committed
### 4. Compiling and installing the application
Andrew Emerson's avatar
Andrew Emerson committed
### Standard installation
Installation is achieved by the usual ```configure, make, make install ``` procedure.
However, it is recommended that the user checks the __make.inc__ file created by this procedure before performing the make.
For example, using the Intel compilers,
module load intel intelmpi
CC=icc FC=ifort MPIF90=mpiifort ./configure --enable-openmp --with-scalapack=intel
Assuming the __make.inc__ file is acceptable, the user can then do:
make; make install

### GPU
Andrew Emerson's avatar
Andrew Emerson committed
Check the __README.md__ file in the downloaded files since the
procedure varies from distribution to distribution.
Most distributions do not have a ```configure``` command. Instead you copy a __make.inc__
file from the __install__ directory, and modify that directly before running make.
A number of templates are available in the distribution:
- make.inc_x86-64
- make.inc_CRAY_PizDaint
- make.inc_POWER_DAVIDE

The second and third are particularly relevant in the PRACE infrastructure (ie. for CSCS
PizDaint and CINECA DAVIDE).
Run __make__ to see the options available. For the UEABS you should select the
pw program (the only module currently available)
Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
make pw
Andrew Emerson's avatar
Andrew Emerson committed
The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`.

Andrew Emerson's avatar
Andrew Emerson committed
 Running the program - general procedure
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
Of course you need some input before you can run calculations. The
input files are of two types: 

Andrew Emerson's avatar
Andrew Emerson committed
1. A control file usually called ``
Andrew Emerson's avatar
Andrew Emerson committed
2. One or more pseudopotential files with extension `.UPF`
Andrew Emerson's avatar
Andrew Emerson committed
The pseudopotential files are placed in a directory specified in the
Andrew Emerson's avatar
Andrew Emerson committed
control file with the tag pseudo\_dir.  Thus if we have
Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
then QE-GPU will look for the pseudopotential
Andrew Emerson's avatar
Andrew Emerson committed
files in the current directory. 

If using the PRACE benchmark suite the data files can be
Andrew Emerson's avatar
Andrew Emerson committed
downloaded from the QE website or the PRACE respository. For example, 
Andrew Emerson's avatar
Andrew Emerson committed
Andrew Emerson's avatar
Andrew Emerson committed
Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores): 

Andrew Emerson's avatar
Andrew Emerson committed
mpirun -n 16 pw-gpu.x -input 
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
but check your system documentation since mpirun may be replaced by
Andrew Emerson's avatar
Andrew Emerson committed
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
Andrew Emerson's avatar
Andrew Emerson committed
allowed to run MPI programs interactively without using the
Andrew Emerson's avatar
Andrew Emerson committed
batch system. 
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
### Parallelisation options
Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation
over the k points available in the input system. This is achieved with the ```-npool``` program option.
Thus for the AUSURF input which has 2 k points we can run:
srun -n 64 pw.x -npool 2 -input
which would allocate 32 MPI tasks per k-point.

The number of MPI tasks must be a multiple of the number of k-points. For the TA2O5 input, which has 26 k-points, we could try:
srun -n 52 pw.x -npool 26 -input
but we may wish to use fewer pools but with more tasks per pool:
srun -n 52 pw.x -npool 13 -input
#### Use of ndiag

Andrew Emerson's avatar
Andrew Emerson committed
### Hints for running the GPU version
Andrew Emerson's avatar
Andrew Emerson committed

#### Memory
Andrew Emerson's avatar
Andrew Emerson committed
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).

Andrew Emerson's avatar
Andrew Emerson committed
## Execution
Andrew Emerson's avatar
Andrew Emerson committed
In the UEABS repository you will find a directory for each computer system tested, together with installation
instructions and job scripts.
In the  following we describe in detail the execution procedure for the Marconi computer system.
Andrew Emerson's avatar
Andrew Emerson committed
### Execution on the Cineca Marconi KNL system

Quantum Espresso has already been installed for the KNL nodes of
Marconi and can be accessed via a specific module:

``` shell
module load profile/knl
module load autoload qe/6.0_knl

On Marconi the default is to use the MCDRAM as cache, and have the
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.

Andrew Emerson's avatar
Andrew Emerson committed
An example SLURM batch script for the A2 partition is given below:

``` shell
Andrew Emerson's avatar
Andrew Emerson committed
#SBATCH --tasks-per-node=34
#SBATCH -A <accountno>
#SBATCH -t 1:00:00

module purge
module load profile/knl
module load autoload qe/6.0_knl


mpirun pw.x -npool 4 -input > file.out


Andrew Emerson's avatar
Andrew Emerson committed
In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in
cache/quadrant mode and 93 Gb main memory each. We are running QE in
Andrew Emerson's avatar
Andrew Emerson committed
hybrid mode using 34 MPI processes/node, each with 4 OpenMP
threads/process and distributing the k-points in 4 pools; the Intel
MKl library will also use 4 OpenMP threads/process. 

Note that this script needs to be submitted using the KNL scheduler as follows:

``` shell
module load env-knl
Andrew Emerson's avatar
Andrew Emerson committed
sbatch myjob


Please check the Cineca documentation for information on using the
Andrew Emerson's avatar
Andrew Emerson committed
[Marconi KNL partition]
Andrew Emerson's avatar
Andrew Emerson committed
## 7. References
Andrew Emerson's avatar
Andrew Emerson committed
1. QE-GPU build and download instructions,

Last updated: 7-April-2017