Commit ec0bc6f5 authored by Andrew Emerson's avatar Andrew Emerson
Browse files

README mods

parent fc23c18b
......@@ -14,12 +14,14 @@
## 1. Introduction
### GPU version
The GPU port of Quantum Espresso is a version of the program which has been
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these
experiments is v6.0, even though further versions becamse available later during the
activity.
## 2. Build Requirements
### 2. Build Requirements
For complete build requirements and information see the following GitHub site:
[QE-GPU](https://github.com/fspiga/qe-gpu)
......@@ -38,7 +40,7 @@ Optional
with the distribution.
## 3. Downloading the software
### 3. Downloading the software
Available from the web site given above. You can use, for example, ``git clone``
to download the software:
......@@ -46,7 +48,7 @@ to download the software:
git clone https://github.com/fspiga/qe-gpu.git
```
## 4. Compiling and installing the application
### 4. Compiling and installing the application
Check the __README.md__ file in the downloaded files since the
procedure varies from distribution to distribution.
......@@ -100,13 +102,18 @@ mpirun -n 16 pw-gpu.x -input pw.in
but check your system documentation since mpirun may be replaced by
`mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not
allowed to run MPI programs interactively but must instead use the
allowed to run MPI programs interactively without using the
batch system.
A couple of examples for PRACE systems are given in the next section.
### Hints for running the GPU version
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
## 6. Examples
We now give a build and 2 run examples.
Example job scripts for various supercomputer systems in PRACE are available in the repository.
### Computer System: DAVIDE P100 cluster, cineca
......@@ -128,20 +135,20 @@ haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.
An example PBS batch script for the A2 partition is given below:
An example SLURM batch script for the A2 partition is given below:
``` shell
#!/bin/bash
#PBS -l walltime=06:00:00
#PBS -l select=2:mpiprocs=34:ncpus=68:mem=93gb
#PBS -A <your account_no>
#PBS -N jobname
#SBATCH -N2
#SBATCH --tasks-per-node=34
#SBATCH -A <accountno>
#SBATCH -t 1:00:00
module purge
module load profile/knl
module load autoload qe/6.0_knl
cd ${PBS_O_WORKDIR}
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=${OMP_NUM_THREADS}
......@@ -150,7 +157,7 @@ mpirun pw.x -npool 4 -input file.in > file.out
```
In the above with the PBS directives we have asked for 2 KNL nodes (each with 68 cores) in
In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in
cache/quadrant mode and 93 Gb main memory each. We are running QE in
hybrid mode using 34 MPI processes/node, each with 4 OpenMP
threads/process and distributing the k-points in 4 pools; the Intel
......@@ -160,7 +167,7 @@ Note that this script needs to be submitted using the KNL scheduler as follows:
``` shell
module load env-knl
qsub myjob
sbatch myjob
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment