The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
## Execution
In the UEABS repository you will find a directory for each computer system tested, together with installation
instructions and job scripts.
In the following we describe in detail the execution procedure for the Marconi computer system.
### Execution on the Cineca Marconi KNL system
### Execution on the Cineca Galileo (x86) system
Quantum Espresso has already been installed for the KNL nodes of
Marconi and can be accessed via a specific module:
Quantum Espresso has already been installed on the cluster
and can be accessed via a specific module:
``` shell
module load profile/knl
module load autoload qe/6.0_knl
```bash
module load profile/phys
module load autoload qe/6.5
```
On Marconi the default is to use the MCDRAM as cache, and have the
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.
An example SLURM batch script for the A2 partition is given below:
An example SLURM batch script is given below:
``` shell
```bash
#!/bin/bash
#SBATCH -N2
#SBATCH --tasks-per-node=64
#SBATCH -A <accountno>
#SBATCH -t 1:00:00
#SBATCH --time=06:00:00 # Walltime in hh:mm:ss
#SBATCH --nodes=4 # Number of nodes
#SBATCH --ntasks-per-node=18 # Number of MPI ranks per node
#SBATCH --cpus-per-task=2 # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem=118000 # Per nodes memory request (MB)