Skip to content
README_ACC.md 5.95 KiB
Newer Older
Andrew Emerson's avatar
Andrew Emerson committed
# Quantum Espresso in the Accelerated Benchmark Suite
## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca.
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
## Contents

1.	Introduction	
2.	Requirements	
3.	Downloading the software	
4.	Compiling the application	
5.	Running the program	
6.	Example	
7.	References	


Andrew Emerson's avatar
Andrew Emerson committed
## 1. Introduction
Andrew Emerson's avatar
Andrew Emerson committed
The GPU port of Quantum Espresso is a version of the program which has been 
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these
experiments is v6.0, even though further versions becamse available later during the 
activity. 
Andrew Emerson's avatar
Andrew Emerson committed
## 2. Build Requirements

For complete build requirements and information see the following GitHub site:
[QE-GPU](https://github.com/fspiga/qe-gpu)
A short summary is given below:
Andrew Emerson's avatar
Andrew Emerson committed

Essential

Andrew Emerson's avatar
Andrew Emerson committed
 * The PGI compiler version 17.4 or above.
 * You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100).
   No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend
   for their on-board memory capacity and double precision performance.
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
Optional
Andrew Emerson's avatar
Andrew Emerson committed
* A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If
  none is available  on your system then the installation can use a version supplied
  with the distribution.  
Andrew Emerson's avatar
Andrew Emerson committed
## 3. Downloading the software
Andrew Emerson's avatar
Andrew Emerson committed
Available from the web site given above. You can use, for example, ``git clone``
to download the software:
```bash
git clone https://github.com/fspiga/qe-gpu.git
```
Andrew Emerson's avatar
Andrew Emerson committed
## 4. Compiling and installing the application
Andrew Emerson's avatar
Andrew Emerson committed
This distribution does not have a ```configure``` command. Instead you make
changes directly in the ```make.inc``` file.
Andrew Emerson's avatar
Andrew Emerson committed
make -f Makefile.gpu pw-gpu
```
Andrew Emerson's avatar
Andrew Emerson committed

In this example we are compiling with the Intel FORTRAN compiler so we
can use the Intel MKL version of Scalapack.  Note also that in the
above it is assumed that the CUDA library has been installed in the
directory `/usr/local/cuda/7.0.1`.
 
The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`.

Andrew Emerson's avatar
Andrew Emerson committed
## 5. Running the program
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
Of course you need some input before you can run calculations. The
input files are of two types: 

Andrew Emerson's avatar
Andrew Emerson committed
1. A control file usually called `pw.in`
Andrew Emerson's avatar
Andrew Emerson committed
2. One or more pseudopotential files with extension `.UPF`
Andrew Emerson's avatar
Andrew Emerson committed
The pseudopotential files are placed in a directory specified in the
Andrew Emerson's avatar
Andrew Emerson committed
control file with the tag pseudo\_dir.  Thus if we have
Andrew Emerson's avatar
Andrew Emerson committed
```shell
Andrew Emerson's avatar
Andrew Emerson committed
pseudo_dir=./
Andrew Emerson's avatar
Andrew Emerson committed
```
Andrew Emerson's avatar
Andrew Emerson committed
then QE-GPU will look for the pseudopotential
Andrew Emerson's avatar
Andrew Emerson committed
files in the current directory. 

If using the PRACE benchmark suite the data files can be
Andrew Emerson's avatar
Andrew Emerson committed
downloaded from the QE website or the PRACE respository. For example, 
Andrew Emerson's avatar
Andrew Emerson committed
```shell
wget http://www.prace-ri.eu/UEABS/Quantum\_Espresso/QuantumEspresso_TestCaseA.tar.gz
```
Andrew Emerson's avatar
Andrew Emerson committed
Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores): 

```shell
Andrew Emerson's avatar
Andrew Emerson committed
mpirun -n 16 pw-gpu.x -input pw.in 
Andrew Emerson's avatar
Andrew Emerson committed
```

Andrew Emerson's avatar
Andrew Emerson committed
but check your system documentation since mpirun may be replaced by
Andrew Emerson's avatar
Andrew Emerson committed
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
Andrew Emerson's avatar
Andrew Emerson committed
allowed to run MPI programs interactively but must instead use the
batch system. 
Andrew Emerson's avatar
Andrew Emerson committed

Andrew Emerson's avatar
Andrew Emerson committed
A couple of examples for PRACE systems are given in the next section.

## 6. Examples
We now give a build and 2 run examples. 
Andrew Emerson's avatar
Andrew Emerson committed
### Computer System: Cartesius GPU partition, SURFSARA.
Andrew Emerson's avatar
Andrew Emerson committed

#### Build
Andrew Emerson's avatar
Andrew Emerson committed
``` shell
Andrew Emerson's avatar
Andrew Emerson committed
wget http://www.qe-forge.org/gf/download/frsrelease/204/912/espresso-5.4.0.tar.gz
tar zxvf espresso-5.4.0.tar.gz
cd espresso-5.4.0
wget https://github.com/fspiga/QE-GPU/archive/5.4.0.tar.gz
tar zxvf 5.4.0.tar.gz

Andrew Emerson's avatar
Andrew Emerson committed
ln -s QE-GPU-5.4.0 GPU
Andrew Emerson's avatar
Andrew Emerson committed
cd GPU
module load mpi
module load mkl
module load cuda
./configure --enable-parallel --enable-openmp --with-scalapack=intel \
  --enable-cuda --with-gpu-arch=sm_35 \
  --with-cuda-dir=$CUDA_HOME \
  --without-magma --with-phigemm
cd ..
make -f Makefile.gpu pw-gpu
Andrew Emerson's avatar
Andrew Emerson committed
```
#### Running
Andrew Emerson's avatar
Andrew Emerson committed
Cartesius uses the SLURM scheduler. An example batch script is given below,
Andrew Emerson's avatar
Andrew Emerson committed

``` shell
Andrew Emerson's avatar
Andrew Emerson committed
#!/bin/bash 
#SBATCH -N 6 --ntasks-per-node=16
#SBATCH -p gpu
#SBATCH -t 01:00:00

module load fortran mkl mpi/impi cuda

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${SURFSARA_MKL_LIB}
srun  pw-gpu.x -input pw.in >job.out
Andrew Emerson's avatar
Andrew Emerson committed
```

You should create a file containing the above commands
(e.g. myjob.sub) and then submit to the batch system, e.g. 
Andrew Emerson's avatar
Andrew Emerson committed

``` shell
Andrew Emerson's avatar
Andrew Emerson committed
sbatch myjob.sub 
```

Please check the SURFSara documentation for more information on how to
use the batch system. 
Andrew Emerson's avatar
Andrew Emerson committed
### Computer System: Marconi KNL partition (A2), Cineca
Andrew Emerson's avatar
Andrew Emerson committed
#### Running

Quantum Espresso has already been installed for the KNL nodes of
Marconi and can be accessed via a specific module:

``` shell
module load profile/knl
module load autoload qe/6.0_knl
```

On Marconi the default is to use the MCDRAM as cache, and have the
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.

An example PBS batch script for the A2 partition is given below:

``` shell
#!/bin/bash
#PBS -l walltime=06:00:00
#PBS -l select=2:mpiprocs=34:ncpus=68:mem=93gb
#PBS -A <your account_no>
#PBS -N jobname

module purge
module load profile/knl
module load autoload qe/6.0_knl

cd ${PBS_O_WORKDIR}

export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=${OMP_NUM_THREADS}

mpirun pw.x -npool 4 -input file.in > file.out

```

In the above with the PBS directives we have asked for 2 KNL nodes (each with 68 cores) in
cache/quadrant mode and 93 Gb main memory each. We are running QE in
Andrew Emerson's avatar
Andrew Emerson committed
hybrid mode using 34 MPI processes/node, each with 4 OpenMP
threads/process and distributing the k-points in 4 pools; the Intel
MKl library will also use 4 OpenMP threads/process. 

Note that this script needs to be submitted using the KNL scheduler as follows:

``` shell
module load env-knl
qsub myjob

```

Please check the Cineca documentation for information on using the
Andrew Emerson's avatar
Andrew Emerson committed
[Marconi KNL partition]
(https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture).
Andrew Emerson's avatar
Andrew Emerson committed
## 7. References
Andrew Emerson's avatar
Andrew Emerson committed
1. QE-GPU build and download instructions, https://github.com/QEF/qe-gpu-plugin.

Last updated: 7-April-2017