The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
## Execution
## Execution
In the UEABS repository you will find a directory for each computer system tested, together with installation
In the UEABS repository you will find a directory for each computer system tested, together with installation
...
@@ -173,14 +194,14 @@ In the following we describe in detail the execution procedure for the Marconi
...
@@ -173,14 +194,14 @@ In the following we describe in detail the execution procedure for the Marconi
Quantum Espresso has already been installed on the cluster
Quantum Espresso has already been installed on the cluster
and can be accessed via a specific module:
and can be accessed via a specific module:
``` shell
```bash
module load profile/phys
module load profile/phys
module load autoload qe/6.5
module load autoload qe/6.5
```
```
An example SLURM batch script for the A2 partition is given below:
An example SLURM batch script is given below:
``` shell
```bash
#!/bin/bash
#!/bin/bash
#SBATCH --time=06:00:00 # Walltime in hh:mm:ss
#SBATCH --time=06:00:00 # Walltime in hh:mm:ss
#SBATCH --nodes=4 # Number of nodes
#SBATCH --nodes=4 # Number of nodes
...
@@ -199,32 +220,26 @@ module load autoload qe/6.5
...
@@ -199,32 +220,26 @@ module load autoload qe/6.5
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MKL_NUM_THREADS=${OMP_NUM_THREADS}
export MKL_NUM_THREADS=${OMP_NUM_THREADS}
mpirun pw.x -npool 4 -input file.in > file.out
srun pw.x -npool 4 -input file.in > file.out
```
```
In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in
In the above with the SLURM directives we have asked for 4 nodes, 18 MPI tasks per node and 2 OpenMP threads
cache/quadrant mode and 93 Gb main memory each. We are running QE in MPI-only
per task.
mode using 18 MPI processes/node with the k-points in 4 pools.
Note that this script needs to be submitted using the KNL scheduler as follows:
Note that this script needs to be submitted using SLURM scheduler as follows:
``` shell
``` bash
module load env-knl
sbatch myjob
sbatch myjob
```
```
Please check the Cineca documentation for information on using the