README.md 8.41 KB
Newer Older
Andrew Emerson's avatar
Andrew Emerson committed
1
# Quantum Espresso in the United European Applications Benchmark Suite (UEABS)
Andrew Emerson's avatar
Andrew Emerson committed
2
## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca.
Andrew Emerson's avatar
Andrew Emerson committed
3

Andrew Emerson's avatar
Andrew Emerson committed
4

Andrew Emerson's avatar
Andrew Emerson committed
5
## Introduction
Andrew Emerson's avatar
Andrew Emerson committed
6

Andrew Emerson's avatar
README    
Andrew Emerson committed
7
Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. 
Andrew Emerson's avatar
Andrew Emerson committed
8
9
Full documentation is available from the project website [QuantumEspresso](https://www.quantum-espresso.org/).
In this README we give information relevant for its use in the UEABS.
Andrew Emerson's avatar
README    
Andrew Emerson committed
10
11
12
13

### Standard CPU version
For the UEABS activity we have used mainly version v6.0 but later versions are now available.

Andrew Emerson's avatar
Andrew Emerson committed
14
### GPU version
Andrew Emerson's avatar
Andrew Emerson committed
15
16
17
18
The GPU port of Quantum Espresso is a version of the program which has been 
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these
experiments is v6.0, even though further versions becamse available later during the 
activity. 
Andrew Emerson's avatar
Andrew Emerson committed
19

Andrew Emerson's avatar
Andrew Emerson committed
20
## Installation and requirements
Andrew Emerson's avatar
README    
Andrew Emerson committed
21
22
23

### Standard
The Quantum Espresso source can be downloaded from the projects GitHub repository,[QE](https://github.com/QEF/q-e/tags). Requirements can be found from the website but you will need a good FORTRAN and C compiler with an MPI library and optionally (but highly recommended) an optimised linear algebra library.
Andrew Emerson's avatar
Andrew Emerson committed
24

Andrew Emerson's avatar
README    
Andrew Emerson committed
25
### GPU version
Andrew Emerson's avatar
Andrew Emerson committed
26
27
28
For complete build requirements and information see the following GitHub site:
[QE-GPU](https://github.com/fspiga/qe-gpu)
A short summary is given below:
Andrew Emerson's avatar
Andrew Emerson committed
29
30
31

Essential

Andrew Emerson's avatar
Andrew Emerson committed
32
33
34
35
 * The PGI compiler version 17.4 or above.
 * You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100).
   No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend
   for their on-board memory capacity and double precision performance.
Andrew Emerson's avatar
Andrew Emerson committed
36

Andrew Emerson's avatar
Andrew Emerson committed
37
Optional
Andrew Emerson's avatar
Andrew Emerson committed
38
39
40
* A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If
  none is available  on your system then the installation can use a version supplied
  with the distribution.  
Andrew Emerson's avatar
Andrew Emerson committed
41
42


Andrew Emerson's avatar
Andrew Emerson committed
43
## Downloading the software
Andrew Emerson's avatar
README    
Andrew Emerson committed
44
45
46
47
48
49

### Standard
From the website, for example:
```bash
wget https://github.com/QEF/q-e/releases/download/qe-6.3/qe-6.3.tar.gz
```
Andrew Emerson's avatar
Andrew Emerson committed
50

Andrew Emerson's avatar
README    
Andrew Emerson committed
51
52
### GPU
Available from the web site given above. You can use, for example, ```git clone```
Andrew Emerson's avatar
Andrew Emerson committed
53
54
55
56
to download the software:
```bash
git clone https://github.com/fspiga/qe-gpu.git
```
Andrew Emerson's avatar
Andrew Emerson committed
57

Andrew Emerson's avatar
Andrew Emerson committed
58
## Compiling and installing the application
Andrew Emerson's avatar
Andrew Emerson committed
59

Andrew Emerson's avatar
README    
Andrew Emerson committed
60
61
62
63
64
65
66
67
68
69
70
71
72
73
### Standard installation
Installation is achieved by the usual ```configure, make, make install ``` procedure.
However, it is recommended that the user checks the __make.inc__ file created by this procedure before performing the make.
For example, using the Intel compilers,
```bash
module load intel intelmpi
CC=icc FC=ifort MPIF90=mpiifort ./configure --enable-openmp --with-scalapack=intel
```
Assuming the __make.inc__ file is acceptable, the user can then do:
```bash
make; make install
```

### GPU
Andrew Emerson's avatar
Andrew Emerson committed
74
75
76
77
78
79
80
81
82
83
84
85
86
87
Check the __README.md__ file in the downloaded files since the
procedure varies from distribution to distribution.
Most distributions do not have a ```configure``` command. Instead you copy a __make.inc__
file from the __install__ directory, and modify that directly before running make.
A number of templates are available in the distribution:
- make.inc_x86-64
- make.inc_CRAY_PizDaint
- make.inc_POWER_DAVIDE
- make.inc_POWER_SUMMITDEV

The second and third are particularly relevant in the PRACE infrastructure (ie. for CSCS
PizDaint and CINECA DAVIDE).
Run __make__ to see the options available. For the UEABS you should select the
pw program (the only module currently available)
Andrew Emerson's avatar
Andrew Emerson committed
88

Andrew Emerson's avatar
Andrew Emerson committed
89
```
Andrew Emerson's avatar
Andrew Emerson committed
90
91
make pw
```
Andrew Emerson's avatar
Andrew Emerson committed
92
93
94
 
The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`.

Andrew Emerson's avatar
Andrew Emerson committed
95
## Running the program - general procedure
Andrew Emerson's avatar
Andrew Emerson committed
96

Andrew Emerson's avatar
Andrew Emerson committed
97
98
99
Of course you need some input before you can run calculations. The
input files are of two types: 

Andrew Emerson's avatar
Andrew Emerson committed
100
1. A control file usually called `pw.in`
Andrew Emerson's avatar
Andrew Emerson committed
101

Andrew Emerson's avatar
Andrew Emerson committed
102
2. One or more pseudopotential files with extension `.UPF`
Andrew Emerson's avatar
Andrew Emerson committed
103
The pseudopotential files are placed in a directory specified in the
Andrew Emerson's avatar
Andrew Emerson committed
104
control file with the tag pseudo\_dir.  Thus if we have
Andrew Emerson's avatar
Andrew Emerson committed
105

Andrew Emerson's avatar
Andrew Emerson committed
106
```shell
Andrew Emerson's avatar
Andrew Emerson committed
107
pseudo_dir=./
Andrew Emerson's avatar
Andrew Emerson committed
108
```
Andrew Emerson's avatar
Andrew Emerson committed
109
then QE-GPU will look for the pseudopotential
Andrew Emerson's avatar
Andrew Emerson committed
110
111
112
files in the current directory. 

If using the PRACE benchmark suite the data files can be
Andrew Emerson's avatar
Andrew Emerson committed
113
downloaded from the QE website or the PRACE respository. For example, 
Andrew Emerson's avatar
Andrew Emerson committed
114
115
116
```shell
wget http://www.prace-ri.eu/UEABS/Quantum\_Espresso/QuantumEspresso_TestCaseA.tar.gz
```
Andrew Emerson's avatar
Andrew Emerson committed
117
118
119
120
Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores): 

```shell
Andrew Emerson's avatar
Andrew Emerson committed
121
mpirun -n 16 pw-gpu.x -input pw.in 
Andrew Emerson's avatar
Andrew Emerson committed
122
123
```

Andrew Emerson's avatar
Andrew Emerson committed
124
but check your system documentation since mpirun may be replaced by
Andrew Emerson's avatar
Andrew Emerson committed
125
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
Andrew Emerson's avatar
Andrew Emerson committed
126
allowed to run MPI programs interactively without using the
Andrew Emerson's avatar
Andrew Emerson committed
127
batch system. 
Andrew Emerson's avatar
Andrew Emerson committed
128

Andrew Emerson's avatar
README    
Andrew Emerson committed
129
130
### Parallelisation options
Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation
Andrew Emerson's avatar
Andrew Emerson committed
131
over the *k points* available in the input system. This is achieved with the ```-npool``` program option.
Andrew Emerson's avatar
README    
Andrew Emerson committed
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
Thus for the AUSURF input which has 2 k points we can run:
```bash
srun -n 64 pw.x -npool 2 -input pw.in
```
which would allocate 32 MPI tasks per k-point.

The number of MPI tasks must be a multiple of the number of k-points. For the TA2O5 input, which has 26 k-points, we could try:
```bash
srun -n 52 pw.x -npool 26 -input pw.in
```
but we may wish to use fewer pools but with more tasks per pool:
```bash
srun -n 52 pw.x -npool 13 -input pw.in
```

Andrew Emerson's avatar
Andrew Emerson committed
147
148
149
150
151
152
It is also possible to control the number of MPI tasks used in the diagonalization of the
subspace Hamiltonian. This is possible with the  ```-ndiag``` parameter which must be a square number.
For example with the AUSURF input with k-points we can assign 4 processes for the Hamiltonian diagonisation:
```bash
srun -n 64 pw.x -npool 2 -ndiag 4 -input pw.in
```
Andrew Emerson's avatar
README    
Andrew Emerson committed
153

Andrew Emerson's avatar
Andrew Emerson committed
154

Andrew Emerson's avatar
Andrew Emerson committed
155
### Hints for running the GPU version
Andrew Emerson's avatar
README    
Andrew Emerson committed
156
157


Andrew Emerson's avatar
Andrew Emerson committed
158
#### Memory limitations
Andrew Emerson's avatar
Andrew Emerson committed
159
160
161
162
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).

Andrew Emerson's avatar
Andrew Emerson committed
163
164


Andrew Emerson's avatar
README    
Andrew Emerson committed
165
## Execution
Andrew Emerson's avatar
Andrew Emerson committed
166

Andrew Emerson's avatar
README    
Andrew Emerson committed
167
168
169
In the UEABS repository you will find a directory for each computer system tested, together with installation
instructions and job scripts.
In the  following we describe in detail the execution procedure for the Marconi computer system.
Andrew Emerson's avatar
Andrew Emerson committed
170

Andrew Emerson's avatar
README    
Andrew Emerson committed
171
### Execution on the Cineca Marconi KNL system
Andrew Emerson's avatar
Andrew Emerson committed
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186

Quantum Espresso has already been installed for the KNL nodes of
Marconi and can be accessed via a specific module:

``` shell
module load profile/knl
module load autoload qe/6.0_knl
```

On Marconi the default is to use the MCDRAM as cache, and have the
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.

Andrew Emerson's avatar
Andrew Emerson committed
187
An example SLURM batch script for the A2 partition is given below:
Andrew Emerson's avatar
Andrew Emerson committed
188
189
190

``` shell
#!/bin/bash
Andrew Emerson's avatar
Andrew Emerson committed
191
#SBATCH -N2
Andrew Emerson's avatar
Andrew Emerson committed
192
#SBATCH --tasks-per-node=64
Andrew Emerson's avatar
Andrew Emerson committed
193
194
195
#SBATCH -A <accountno>
#SBATCH -t 1:00:00

Andrew Emerson's avatar
Andrew Emerson committed
196
197
198
199
200
201

module purge
module load profile/knl
module load autoload qe/6.0_knl


Andrew Emerson's avatar
Andrew Emerson committed
202
export OMP_NUM_THREADS=1
Andrew Emerson's avatar
Andrew Emerson committed
203
204
export MKL_NUM_THREADS=${OMP_NUM_THREADS}

Andrew Emerson's avatar
Andrew Emerson committed
205
srun pw.x -npool 2 -ndiag 16 -input file.in > file.out
Andrew Emerson's avatar
Andrew Emerson committed
206
207
208

```

Andrew Emerson's avatar
Andrew Emerson committed
209
In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in
Andrew Emerson's avatar
Andrew Emerson committed
210
211
212
cache/quadrant mode and 93 Gb main memory each. We are running QE in MPI-only
mode using 64 MPI processes/node with the k-points in 2 pools; the diagonalisation of the Hamiltonian 
will be done by 16 (4x4) tasks.
Andrew Emerson's avatar
Andrew Emerson committed
213
214
215
216
217

Note that this script needs to be submitted using the KNL scheduler as follows:

``` shell
module load env-knl
Andrew Emerson's avatar
Andrew Emerson committed
218
sbatch myjob
Andrew Emerson's avatar
Andrew Emerson committed
219
220
221
222

```

Please check the Cineca documentation for information on using the
Andrew Emerson's avatar
Andrew Emerson committed
223
224
[Marconi KNL partition]
(https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture).
Andrew Emerson's avatar
Andrew Emerson committed
225

Andrew Emerson's avatar
Andrew Emerson committed
226
227
228
229
230
231
232
## UEABS test cases

| UEABS name | QE name | Description | k-points | Notes|
|------------|---------------|-------------|----------|------|
| Small test case | AUSURF | 112 atoms | 2 | < 4-8 nodes on most systems |
| Medium test case | TA2O5 | Tantalum oxide| 26| Medium scaling, often 20 nodes |
| Large test case | CNT | Carbon nanotube | | Large scaling runs only. Memory and time requirements high|
Andrew Emerson's avatar
Andrew Emerson committed
233

Andrew Emerson's avatar
Andrew Emerson committed
234

Andrew Emerson's avatar
Andrew Emerson committed
235
__Last updated: 14-January-2019__