README_ACC.md 5.95 KB
Newer Older
Andrew Emerson's avatar
Andrew Emerson committed
1
2
# Quantum Espresso in the Accelerated Benchmark Suite
## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca.
Andrew Emerson's avatar
Andrew Emerson committed
3

Andrew Emerson's avatar
Andrew Emerson committed
4

Andrew Emerson's avatar
Andrew Emerson committed
5
6
7
8
9
10
11
12
13
14
15
## Contents

1.	Introduction	
2.	Requirements	
3.	Downloading the software	
4.	Compiling the application	
5.	Running the program	
6.	Example	
7.	References	


Andrew Emerson's avatar
Andrew Emerson committed
16
## 1. Introduction
Andrew Emerson's avatar
Andrew Emerson committed
17
18
19
20
The GPU port of Quantum Espresso is a version of the program which has been 
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these
experiments is v6.0, even though further versions becamse available later during the 
activity. 
Andrew Emerson's avatar
Andrew Emerson committed
21

Andrew Emerson's avatar
Andrew Emerson committed
22
23
24
25
26
## 2. Build Requirements

For complete build requirements and information see the following GitHub site:
[QE-GPU](https://github.com/fspiga/qe-gpu)
A short summary is given below:
Andrew Emerson's avatar
Andrew Emerson committed
27
28
29

Essential

Andrew Emerson's avatar
Andrew Emerson committed
30
31
32
33
 * The PGI compiler version 17.4 or above.
 * You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100).
   No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend
   for their on-board memory capacity and double precision performance.
Andrew Emerson's avatar
Andrew Emerson committed
34

Andrew Emerson's avatar
Andrew Emerson committed
35
Optional
Andrew Emerson's avatar
Andrew Emerson committed
36
37
38
* A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If
  none is available  on your system then the installation can use a version supplied
  with the distribution.  
Andrew Emerson's avatar
Andrew Emerson committed
39
40


Andrew Emerson's avatar
Andrew Emerson committed
41
## 3. Downloading the software
Andrew Emerson's avatar
Andrew Emerson committed
42

Andrew Emerson's avatar
Andrew Emerson committed
43
44
45
46
47
Available from the web site given above. You can use, for example, ``git clone``
to download the software:
```bash
git clone https://github.com/fspiga/qe-gpu.git
```
Andrew Emerson's avatar
Andrew Emerson committed
48

Andrew Emerson's avatar
Andrew Emerson committed
49
## 4. Compiling and installing the application
Andrew Emerson's avatar
Andrew Emerson committed
50

Andrew Emerson's avatar
Andrew Emerson committed
51
52
This distribution does not have a ```configure``` command. Instead you make
changes directly in the ```make.inc``` file.
Andrew Emerson's avatar
Andrew Emerson committed
53

Andrew Emerson's avatar
Andrew Emerson committed
54
55
make -f Makefile.gpu pw-gpu
```
Andrew Emerson's avatar
Andrew Emerson committed
56
57
58
59
60
61
62
63

In this example we are compiling with the Intel FORTRAN compiler so we
can use the Intel MKL version of Scalapack.  Note also that in the
above it is assumed that the CUDA library has been installed in the
directory `/usr/local/cuda/7.0.1`.
 
The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`.

Andrew Emerson's avatar
Andrew Emerson committed
64
## 5. Running the program
Andrew Emerson's avatar
Andrew Emerson committed
65

Andrew Emerson's avatar
Andrew Emerson committed
66
67
68
Of course you need some input before you can run calculations. The
input files are of two types: 

Andrew Emerson's avatar
Andrew Emerson committed
69
1. A control file usually called `pw.in`
Andrew Emerson's avatar
Andrew Emerson committed
70

Andrew Emerson's avatar
Andrew Emerson committed
71
2. One or more pseudopotential files with extension `.UPF`
Andrew Emerson's avatar
Andrew Emerson committed
72
The pseudopotential files are placed in a directory specified in the
Andrew Emerson's avatar
Andrew Emerson committed
73
control file with the tag pseudo\_dir.  Thus if we have
Andrew Emerson's avatar
Andrew Emerson committed
74

Andrew Emerson's avatar
Andrew Emerson committed
75
```shell
Andrew Emerson's avatar
Andrew Emerson committed
76
pseudo_dir=./
Andrew Emerson's avatar
Andrew Emerson committed
77
```
Andrew Emerson's avatar
Andrew Emerson committed
78
then QE-GPU will look for the pseudopotential
Andrew Emerson's avatar
Andrew Emerson committed
79
80
81
files in the current directory. 

If using the PRACE benchmark suite the data files can be
Andrew Emerson's avatar
Andrew Emerson committed
82
downloaded from the QE website or the PRACE respository. For example, 
Andrew Emerson's avatar
Andrew Emerson committed
83
84
85
```shell
wget http://www.prace-ri.eu/UEABS/Quantum\_Espresso/QuantumEspresso_TestCaseA.tar.gz
```
Andrew Emerson's avatar
Andrew Emerson committed
86
87
88
89
Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores): 

```shell
Andrew Emerson's avatar
Andrew Emerson committed
90
mpirun -n 16 pw-gpu.x -input pw.in 
Andrew Emerson's avatar
Andrew Emerson committed
91
92
```

Andrew Emerson's avatar
Andrew Emerson committed
93
but check your system documentation since mpirun may be replaced by
Andrew Emerson's avatar
Andrew Emerson committed
94
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
Andrew Emerson's avatar
Andrew Emerson committed
95
96
allowed to run MPI programs interactively but must instead use the
batch system. 
Andrew Emerson's avatar
Andrew Emerson committed
97

Andrew Emerson's avatar
Andrew Emerson committed
98
99
A couple of examples for PRACE systems are given in the next section.

Andrew Emerson's avatar
Andrew Emerson committed
100
101
## 6. Examples
We now give a build and 2 run examples. 
Andrew Emerson's avatar
Andrew Emerson committed
102

Andrew Emerson's avatar
Andrew Emerson committed
103
### Computer System: Cartesius GPU partition, SURFSARA.
Andrew Emerson's avatar
Andrew Emerson committed
104

Andrew Emerson's avatar
Andrew Emerson committed
105
#### Build
Andrew Emerson's avatar
Andrew Emerson committed
106

Andrew Emerson's avatar
Andrew Emerson committed
107
``` shell
Andrew Emerson's avatar
Andrew Emerson committed
108
109
110
111
112
113
wget http://www.qe-forge.org/gf/download/frsrelease/204/912/espresso-5.4.0.tar.gz
tar zxvf espresso-5.4.0.tar.gz
cd espresso-5.4.0
wget https://github.com/fspiga/QE-GPU/archive/5.4.0.tar.gz
tar zxvf 5.4.0.tar.gz

Andrew Emerson's avatar
Andrew Emerson committed
114
ln -s QE-GPU-5.4.0 GPU
Andrew Emerson's avatar
Andrew Emerson committed
115

Andrew Emerson's avatar
Andrew Emerson committed
116
117
118
119
120
121
122
123
124
125
cd GPU
module load mpi
module load mkl
module load cuda
./configure --enable-parallel --enable-openmp --with-scalapack=intel \
  --enable-cuda --with-gpu-arch=sm_35 \
  --with-cuda-dir=$CUDA_HOME \
  --without-magma --with-phigemm
cd ..
make -f Makefile.gpu pw-gpu
Andrew Emerson's avatar
Andrew Emerson committed
126
```
Andrew Emerson's avatar
Andrew Emerson committed
127

Andrew Emerson's avatar
Andrew Emerson committed
128
#### Running
Andrew Emerson's avatar
Andrew Emerson committed
129
Cartesius uses the SLURM scheduler. An example batch script is given below,
Andrew Emerson's avatar
Andrew Emerson committed
130
131

``` shell
Andrew Emerson's avatar
Andrew Emerson committed
132
133
134
135
136
137
138
139
140
#!/bin/bash 
#SBATCH -N 6 --ntasks-per-node=16
#SBATCH -p gpu
#SBATCH -t 01:00:00

module load fortran mkl mpi/impi cuda

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${SURFSARA_MKL_LIB}
srun  pw-gpu.x -input pw.in >job.out
Andrew Emerson's avatar
Andrew Emerson committed
141
142
143
144
```

You should create a file containing the above commands
(e.g. myjob.sub) and then submit to the batch system, e.g. 
Andrew Emerson's avatar
Andrew Emerson committed
145
146

``` shell
Andrew Emerson's avatar
Andrew Emerson committed
147
148
149
150
151
sbatch myjob.sub 
```

Please check the SURFSara documentation for more information on how to
use the batch system. 
Andrew Emerson's avatar
Andrew Emerson committed
152

Andrew Emerson's avatar
Andrew Emerson committed
153
### Computer System: Marconi KNL partition (A2), Cineca
Andrew Emerson's avatar
Andrew Emerson committed
154
155


Andrew Emerson's avatar
Andrew Emerson committed
156
#### Running
Andrew Emerson's avatar
Andrew Emerson committed
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

Quantum Espresso has already been installed for the KNL nodes of
Marconi and can be accessed via a specific module:

``` shell
module load profile/knl
module load autoload qe/6.0_knl
```

On Marconi the default is to use the MCDRAM as cache, and have the
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.

An example PBS batch script for the A2 partition is given below:

``` shell
#!/bin/bash
#PBS -l walltime=06:00:00
#PBS -l select=2:mpiprocs=34:ncpus=68:mem=93gb
#PBS -A <your account_no>
#PBS -N jobname

module purge
module load profile/knl
module load autoload qe/6.0_knl

cd ${PBS_O_WORKDIR}

export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=${OMP_NUM_THREADS}

mpirun pw.x -npool 4 -input file.in > file.out

```

In the above with the PBS directives we have asked for 2 KNL nodes (each with 68 cores) in
cache/quadrant mode and 93 Gb main memory each. We are running QE in
Andrew Emerson's avatar
Andrew Emerson committed
196
hybrid mode using 34 MPI processes/node, each with 4 OpenMP
Andrew Emerson's avatar
Andrew Emerson committed
197
198
199
200
201
202
203
204
205
206
207
208
threads/process and distributing the k-points in 4 pools; the Intel
MKl library will also use 4 OpenMP threads/process. 

Note that this script needs to be submitted using the KNL scheduler as follows:

``` shell
module load env-knl
qsub myjob

```

Please check the Cineca documentation for information on using the
Andrew Emerson's avatar
Andrew Emerson committed
209
210
[Marconi KNL partition]
(https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture).
Andrew Emerson's avatar
Andrew Emerson committed
211
212


Andrew Emerson's avatar
Andrew Emerson committed
213
## 7. References
Andrew Emerson's avatar
Andrew Emerson committed
214
215
1. QE-GPU build and download instructions, https://github.com/QEF/qe-gpu-plugin.

Andrew Emerson's avatar
Andrew Emerson committed
216
Last updated: 7-April-2017