README.md

# Quantum Espresso in the Accelerated Benchmark Suite
## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca.
## Last update: 16th February 2016
## Contents

1.	Introduction	
2.	Requirements	
3.	Downloading the software	
4.	Compiling the application	
5.	Running the program	
6.	Example	
7.	References	


1. Introduction
The GPU-enabled version of Quantum Espresso (known as QE-GPU) provides
GPU acceleration for the  Plane-Wave Self-Consistent Field (PWscf)
code and energy barriers and reaction pathways through the Nudged
Elastic Band method (NEB) package. QE-GPU is developed as a sort of  
plugin to the main QE program branch and is based on code usually one
or two versions behind the main program version. Note that in the
accelerated benchmark suite, *version 5.4* has been used for QE-GPU
whereas the latest release version of the main package is 6.0. 
QE-GPU is developed by Filippo Spiga and the download and build
instructions for the package are given here [1] if the packages is not
already available on your system.  

2. Requirements

Essential

* Quantum ESPRESSO 5.4
* Kepler GPU: (minimum) CUDA SDK 6.5
* Pascal GPU: (minimum) CUDA SDK 8.0
Optional
* A parallel linear algebra library such as Scalapack or Intel MKL. If
* none is available on your system then the installation can use a
* version supplied with the distribution. 

3. Downloading the software

QE distribution
Many packages are available from the download page but since you need
only the main base package for the benchmark suite, the
`expresso-5.4.0.tar.gz` file will be sufficient. This can be downloaded
as: 
[http://www.quantum-espresso.org/download] (http://www.quantum-espresso.org/download)

GPU plugin
The GPU source code can be conveniently downloaded from this link:
[https://github.com/QEF/qe-gpu-plugin] (https://github.com/QEF/qe-gpu-plugin)

4. Compiling the application
The QE-GPU gives more details but for the benchmark suite we followed
this general procedure: 

1. Uncompress the main QE distribution and copy the GPU source distribution inside:
`tar zxvf espresso-5.4.0.tar.gz
cp 5.4.0.tar.gz espresso-5.4.0`

2. Uncompress the GPU source inside main distribution and create a symbolic link:

`cd espresso-5.4.0
tar zxvf 5.4.0.tar.gz
ln -s QE-GPU-5.4.0 GPU`

3. Run QE-GPU configure and make:
`cd  GPU
./configure --enable-parallel --enable-openmp --with-scalapack=intel \
  --enable-cuda --with-gpu-arch=Kepler \
  --with-cuda-dir=/usr/local/cuda/7.0.1 \
  --without-magma --with-phigemm
cd ..
make -f Makefile.gpu pw-gpu`

In this example we are compiling with the Intel FORTRAN compiler so we
can use the Intel MKL version of Scalapack.  Note also that in the
above it is assumed that the CUDA library has been installed in the
directory `/usr/local/cuda/7.0.1`.
 
The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`.

5. Running the program
Of course you need some input before you can run calculations. The
input files are of two types: 

1. A control file usually called pw.in

2. One or more pseudopotential files with extension .UPF
The pseudopotential files are placed in a directory specified in the
control file with the tag pseudo_dir.  Thus if we have

pseudo_dir=./

then QE-GPU will look for the pseudopotential
files in the current directory.  The data files themselves can be
downloaded from the QE website or the PRACE respository. For example, 
wget http://www.prace-ri.eu/UEABS/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz
Once uncompressed you can then run the program like this (e.g. using MPI over 16 cores):
mpirun -n 16 pw-gpu.x -input pw.in 
but check your system documentation since mpirun may be replaced by
mpiexec, runjob, aprun, srun, etc. Note also that normally you are not
allowed to run MPI programs interactively but must instead use the
batch system. 
A couple of examples for PRACE systems are given in the next section.

6. Example
We now give a build and run example. 
Cartesius GPU partition, SURFSARA.

Build
# Download and unpack sources
wget http://www.qe-forge.org/gf/download/frsrelease/204/912/espresso-5.4.0.tar.gz
tar zxvf espresso-5.4.0.tar.gz
cd espresso-5.4.0
wget https://github.com/fspiga/QE-GPU/archive/5.4.0.tar.gz
tar zxvf 5.4.0.tar.gz

ln s QE-GPU-5.4.0 GPU
# load compiler modules and compile
cd GPU
module load mpi
module load mkl
module load cuda
./configure --enable-parallel --enable-openmp --with-scalapack=intel \
  --enable-cuda --with-gpu-arch=sm_35 \
  --with-cuda-dir=$CUDA_HOME \
  --without-magma --with-phigemm
cd ..
make -f Makefile.gpu pw-gpu

Running
Cartesius uses the SLURM scheduler. An example batch script is given below,
#!/bin/bash 
#SBATCH -N 6 --ntasks-per-node=16
#SBATCH -p gpu
#SBATCH -t 01:00:00

module load fortran mkl mpi/impi cuda

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${SURFSARA_MKL_LIB}
srun  pw-gpu.x -input pw.in >job.out

You should create a file containing the above commands (e.g. myjob.sub) and then submit to the batch system, e.g.
sbatch myjob.sub
Please check the SURFSara documentation for more information on how to use the batch system.
7. References
1. QE-GPU build and download instructions, https://github.com/QEF/qe-gpu-plugin.