README.md 8.45 KB
Newer Older
Andrew Emerson's avatar
Andrew Emerson committed
1
# Quantum Espresso in the United European Applications Benchmark Suite (UEABS)
Andrew Emerson's avatar
Andrew Emerson committed
2
## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca.
Andrew Emerson's avatar
Andrew Emerson committed
3

Andrew Emerson's avatar
Andrew Emerson committed
4

Andrew Emerson's avatar
Andrew Emerson committed
5
## Introduction
Andrew Emerson's avatar
Andrew Emerson committed
6

Andrew Emerson's avatar
README    
Andrew Emerson committed
7
Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. 
Andrew Emerson's avatar
Andrew Emerson committed
8
9
Full documentation is available from the project website [QuantumEspresso](https://www.quantum-espresso.org/).
In this README we give information relevant for its use in the UEABS.
Andrew Emerson's avatar
README    
Andrew Emerson committed
10
11

### Standard CPU version
12
For the UEABS activity we have used mainly version v6.5 but later versions are now available.
Andrew Emerson's avatar
README    
Andrew Emerson committed
13

Andrew Emerson's avatar
Andrew Emerson committed
14
### GPU version
Andrew Emerson's avatar
Andrew Emerson committed
15
The GPU port of Quantum Espresso is a version of the program which has been 
16
17
completely re-written in CUDA FORTRAN. The version program used in these
experiments is v6.5a1, even though later versions may be available. 
Andrew Emerson's avatar
Andrew Emerson committed
18
## Installation and requirements
Andrew Emerson's avatar
README    
Andrew Emerson committed
19
20
21

### Standard
The Quantum Espresso source can be downloaded from the projects GitHub repository,[QE](https://github.com/QEF/q-e/tags). Requirements can be found from the website but you will need a good FORTRAN and C compiler with an MPI library and optionally (but highly recommended) an optimised linear algebra library.
Andrew Emerson's avatar
Andrew Emerson committed
22

Andrew Emerson's avatar
README    
Andrew Emerson committed
23
### GPU version
Andrew Emerson's avatar
Andrew Emerson committed
24
For complete build requirements and information see the following GitHub site:
25
[QE-GPU](https://gitlab.com/QEF/q-e-gpu/-/releases)
Andrew Emerson's avatar
Andrew Emerson committed
26
A short summary is given below:
Andrew Emerson's avatar
Andrew Emerson committed
27
28
29

Essential

Andrew Emerson's avatar
Andrew Emerson committed
30
31
32
33
 * The PGI compiler version 17.4 or above.
 * You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100).
   No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend
   for their on-board memory capacity and double precision performance.
Andrew Emerson's avatar
Andrew Emerson committed
34

Andrew Emerson's avatar
Andrew Emerson committed
35
Optional
Andrew Emerson's avatar
Andrew Emerson committed
36
37
38
* A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If
  none is available  on your system then the installation can use a version supplied
  with the distribution.  
Andrew Emerson's avatar
Andrew Emerson committed
39
40


Andrew Emerson's avatar
Andrew Emerson committed
41
## Downloading the software
Andrew Emerson's avatar
README    
Andrew Emerson committed
42
43
44
45

### Standard
From the website, for example:
```bash
46
wget https://github.com/QEF/q-e/releases/download/qe-6.5/qe-6.5.tar.gz
Andrew Emerson's avatar
README    
Andrew Emerson committed
47
```
Andrew Emerson's avatar
Andrew Emerson committed
48

Andrew Emerson's avatar
README    
Andrew Emerson committed
49
### GPU
50
Available from the web site given above. You can use, for example, ```wget```
Andrew Emerson's avatar
Andrew Emerson committed
51
52
to download the software:
```bash
53
wget https://gitlab.com/QEF/q-e-gpu/-/archive/qe-gpu-6.5a1/q-e-gpu-qe-gpu-6.5a1.tar.gz
Andrew Emerson's avatar
Andrew Emerson committed
54
```
Andrew Emerson's avatar
Andrew Emerson committed
55

Andrew Emerson's avatar
Andrew Emerson committed
56
## Compiling and installing the application
Andrew Emerson's avatar
Andrew Emerson committed
57

Andrew Emerson's avatar
README    
Andrew Emerson committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71
### Standard installation
Installation is achieved by the usual ```configure, make, make install ``` procedure.
However, it is recommended that the user checks the __make.inc__ file created by this procedure before performing the make.
For example, using the Intel compilers,
```bash
module load intel intelmpi
CC=icc FC=ifort MPIF90=mpiifort ./configure --enable-openmp --with-scalapack=intel
```
Assuming the __make.inc__ file is acceptable, the user can then do:
```bash
make; make install
```

### GPU
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
The GPU version is configured similarly to the CPU version, the only exception being that the configure script
will check for the presence of PGI and CUDA libraries.
A typical configure might be

```bash
./configure --with-cuda=XX --with-cuda-runtime=YY --with-cuda-cc=ZZ --enable-openmp [ --with-scalapack=no ]
```
where `XX` is the location of the CUDA Toolkit (in HPC environments is
generally `$CUDA_HOME`), `YY` is the version of the cuda toolkit and `ZZ`
is the compute capability of the card.
For example,

```bash
./configure --with-cuda=$CUDA_HOME --with-cuda-cc=60 --with-cuda-runtime=9.2
```
The __dev-tools/get_device_props.py__  script is available if you dont know these values.

Compilation is then performed as normal by
Andrew Emerson's avatar
Andrew Emerson committed
90

Andrew Emerson's avatar
Andrew Emerson committed
91
```
Andrew Emerson's avatar
Andrew Emerson committed
92
93
make pw
```
94
95
96
97
98
99
100
101
#### Example compilation of Quantum Espresso for GPU based machines

```bash
module load pgi cuda
./configure --with-cuda=$CUDA_HOME --with-cuda-cc=70 --with-cuda-runtime=10.2
make -j8 pw
```

Andrew Emerson's avatar
Andrew Emerson committed
102

Andrew Emerson's avatar
Andrew Emerson committed
103
## Running the program - general procedure
Andrew Emerson's avatar
Andrew Emerson committed
104

Andrew Emerson's avatar
Andrew Emerson committed
105
106
107
Of course you need some input before you can run calculations. The
input files are of two types: 

Andrew Emerson's avatar
Andrew Emerson committed
108
1. A control file usually called `pw.in`
Andrew Emerson's avatar
Andrew Emerson committed
109

Andrew Emerson's avatar
Andrew Emerson committed
110
2. One or more pseudopotential files with extension `.UPF`
Andrew Emerson's avatar
Andrew Emerson committed
111
The pseudopotential files are placed in a directory specified in the
Andrew Emerson's avatar
Andrew Emerson committed
112
control file with the tag pseudo\_dir.  Thus if we have
Andrew Emerson's avatar
Andrew Emerson committed
113

114
```bash
Andrew Emerson's avatar
Andrew Emerson committed
115
pseudo_dir=./
Andrew Emerson's avatar
Andrew Emerson committed
116
```
Andrew Emerson's avatar
Andrew Emerson committed
117
then QE-GPU will look for the pseudopotential
Andrew Emerson's avatar
Andrew Emerson committed
118
119
120
files in the current directory. 

If using the PRACE benchmark suite the data files can be
Andrew Emerson's avatar
Andrew Emerson committed
121
downloaded from the PRACE respository. For example, 
122
```bash
Andrew Emerson's avatar
Andrew Emerson committed
123
wget https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz
Andrew Emerson's avatar
Andrew Emerson committed
124
```
Andrew Emerson's avatar
Andrew Emerson committed
125
126
127
Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores): 

128
```bash
Andrew Emerson's avatar
Andrew Emerson committed
129
mpirun -n 16 pw-gpu.x -input pw.in 
Andrew Emerson's avatar
Andrew Emerson committed
130
131
```

Andrew Emerson's avatar
Andrew Emerson committed
132
but check your system documentation since mpirun may be replaced by
Andrew Emerson's avatar
Andrew Emerson committed
133
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
Andrew Emerson's avatar
Andrew Emerson committed
134
allowed to run MPI programs interactively without using the
Andrew Emerson's avatar
Andrew Emerson committed
135
batch system. 
Andrew Emerson's avatar
Andrew Emerson committed
136

137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
### Running on GPUs
The procedure is identical to running on non accelerator-based hardware.
If GPUs are being used then the following will appear in the program output:

```
    GPU acceleration is ACTIVE.
```

GPU acceleration can be switched off by setting the following environment variable:

```bash
$ export USEGPU=no
```



Andrew Emerson's avatar
README    
Andrew Emerson committed
153
154
### Parallelisation options
Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation
Andrew Emerson's avatar
Andrew Emerson committed
155
over the *k points* available in the input system. This is achieved with the ```-npool``` program option.
Andrew Emerson's avatar
README    
Andrew Emerson committed
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
Thus for the AUSURF input which has 2 k points we can run:
```bash
srun -n 64 pw.x -npool 2 -input pw.in
```
which would allocate 32 MPI tasks per k-point.

The number of MPI tasks must be a multiple of the number of k-points. For the TA2O5 input, which has 26 k-points, we could try:
```bash
srun -n 52 pw.x -npool 26 -input pw.in
```
but we may wish to use fewer pools but with more tasks per pool:
```bash
srun -n 52 pw.x -npool 13 -input pw.in
```

Andrew Emerson's avatar
Andrew Emerson committed
171
172
173
174
175
176
It is also possible to control the number of MPI tasks used in the diagonalization of the
subspace Hamiltonian. This is possible with the  ```-ndiag``` parameter which must be a square number.
For example with the AUSURF input with k-points we can assign 4 processes for the Hamiltonian diagonisation:
```bash
srun -n 64 pw.x -npool 2 -ndiag 4 -input pw.in
```
Andrew Emerson's avatar
README    
Andrew Emerson committed
177

Andrew Emerson's avatar
Andrew Emerson committed
178

Andrew Emerson's avatar
Andrew Emerson committed
179
### Hints for running the GPU version
Andrew Emerson's avatar
README    
Andrew Emerson committed
180

Andrew Emerson's avatar
Andrew Emerson committed
181
182
183
184
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).

Andrew Emerson's avatar
Andrew Emerson committed
185

Andrew Emerson's avatar
README    
Andrew Emerson committed
186
## Execution
Andrew Emerson's avatar
Andrew Emerson committed
187

Andrew Emerson's avatar
README    
Andrew Emerson committed
188
189
190
In the UEABS repository you will find a directory for each computer system tested, together with installation
instructions and job scripts.
In the  following we describe in detail the execution procedure for the Marconi computer system.
Andrew Emerson's avatar
Andrew Emerson committed
191

Andrew Emerson's avatar
Andrew Emerson committed
192
### Execution on the Cineca Galileo (x86) system
Andrew Emerson's avatar
Andrew Emerson committed
193

Andrew Emerson's avatar
Andrew Emerson committed
194
195
Quantum Espresso has already been installed on the cluster
and can be accessed via a specific module:
Andrew Emerson's avatar
Andrew Emerson committed
196

197
``` bash
Andrew Emerson's avatar
Andrew Emerson committed
198
199
module load profile/phys
module load autoload qe/6.5
Andrew Emerson's avatar
Andrew Emerson committed
200
201
```

202
An example SLURM batch script is given below:
Andrew Emerson's avatar
Andrew Emerson committed
203

204
``` bash
Andrew Emerson's avatar
Andrew Emerson committed
205
#!/bin/bash
Andrew Emerson's avatar
Andrew Emerson committed
206
207
208
209
210
211
212
213
#SBATCH --time=06:00:00        # Walltime in hh:mm:ss
#SBATCH --nodes=4              # Number of nodes
#SBATCH --ntasks-per-node=18   # Number of MPI ranks per node
#SBATCH --cpus-per-task=2      # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem=118000           # Per nodes memory request (MB)
#SBATCH --account=<your account_no>
#SBATCH --job-name=jobname
#SBATCH --partition=gll_usr_prod
Andrew Emerson's avatar
Andrew Emerson committed
214
215

module purge
Andrew Emerson's avatar
Andrew Emerson committed
216
217
module load profile/phys
module load autoload qe/6.5
Andrew Emerson's avatar
Andrew Emerson committed
218
219


Andrew Emerson's avatar
Andrew Emerson committed
220
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Andrew Emerson's avatar
Andrew Emerson committed
221
222
export MKL_NUM_THREADS=${OMP_NUM_THREADS}

223
srun pw.x -npool 4 -input file.in > file.out
Andrew Emerson's avatar
Andrew Emerson committed
224
225
```

226
227
In the above with the SLURM directives we have asked for 4 nodes, 18 MPI tasks per node and 2 OpenMP threads
per task. 
Andrew Emerson's avatar
Andrew Emerson committed
228

229
Note that this script needs to be submitted using SLURM scheduler as follows:
Andrew Emerson's avatar
Andrew Emerson committed
230

231
``` bash
Andrew Emerson's avatar
Andrew Emerson committed
232
sbatch myjob
Andrew Emerson's avatar
Andrew Emerson committed
233
234
235

```

Andrew Emerson's avatar
Andrew Emerson committed
236
237
238
239
240
## UEABS test cases

| UEABS name | QE name | Description | k-points | Notes|
|------------|---------------|-------------|----------|------|
| Small test case | AUSURF | 112 atoms | 2 | < 4-8 nodes on most systems |
241
242
| Large test case | GRIR443 | 432 | 4| Medium scaling, often 20 nodes |
| Very Large test case | CNT | Carbon nanotube | | Large scaling runs only. Memory and time requirements very high|
Andrew Emerson's avatar
Andrew Emerson committed
243

Andrew Emerson's avatar
Andrew Emerson committed
244

245
__Last updated: 22-October-2020__