The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
## Execution
In the UEABS repository you will find a directory for each computer system tested, together with installation
...
...
@@ -173,14 +194,14 @@ In the following we describe in detail the execution procedure for the Marconi
Quantum Espresso has already been installed on the cluster
and can be accessed via a specific module:
``` shell
```bash
module load profile/phys
module load autoload qe/6.5
```
An example SLURM batch script for the A2 partition is given below:
An example SLURM batch script is given below:
``` shell
```bash
#!/bin/bash
#SBATCH --time=06:00:00 # Walltime in hh:mm:ss
#SBATCH --nodes=4 # Number of nodes
...
...
@@ -199,32 +220,26 @@ module load autoload qe/6.5
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MKL_NUM_THREADS=${OMP_NUM_THREADS}
mpirun pw.x -npool 4 -input file.in > file.out
srun pw.x -npool 4 -input file.in > file.out
```
In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in
cache/quadrant mode and 93 Gb main memory each. We are running QE in MPI-only
mode using 18 MPI processes/node with the k-points in 4 pools.
In the above with the SLURM directives we have asked for 4 nodes, 18 MPI tasks per node and 2 OpenMP threads
per task.
Note that this script needs to be submitted using the KNL scheduler as follows:
Note that this script needs to be submitted using SLURM scheduler as follows:
``` shell
module load env-knl
``` bash
sbatch myjob
```
Please check the Cineca documentation for information on using the