Skip to content
...@@ -9,14 +9,12 @@ Full documentation is available from the project website [QuantumEspresso](https ...@@ -9,14 +9,12 @@ Full documentation is available from the project website [QuantumEspresso](https
In this README we give information relevant for its use in the UEABS. In this README we give information relevant for its use in the UEABS.
### Standard CPU version ### Standard CPU version
For the UEABS activity we have used mainly version v6.0 but later versions are now available. For the UEABS activity we have used mainly version v6.5 but later versions are now available.
### GPU version ### GPU version
The GPU port of Quantum Espresso is a version of the program which has been The GPU port of Quantum Espresso is a version of the program which has been
completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these completely re-written in CUDA FORTRAN. The version program used in these
experiments is v6.0, even though further versions becamse available later during the experiments is v6.5a1, even though later versions may be available.
activity.
## Installation and requirements ## Installation and requirements
### Standard ### Standard
...@@ -24,7 +22,7 @@ The Quantum Espresso source can be downloaded from the projects GitHub repositor ...@@ -24,7 +22,7 @@ The Quantum Espresso source can be downloaded from the projects GitHub repositor
### GPU version ### GPU version
For complete build requirements and information see the following GitHub site: For complete build requirements and information see the following GitHub site:
[QE-GPU](https://github.com/fspiga/qe-gpu) [QE-GPU](https://gitlab.com/QEF/q-e-gpu/-/releases)
A short summary is given below: A short summary is given below:
Essential Essential
...@@ -45,14 +43,14 @@ Optional ...@@ -45,14 +43,14 @@ Optional
### Standard ### Standard
From the website, for example: From the website, for example:
```bash ```bash
wget https://github.com/QEF/q-e/releases/download/qe-6.3/qe-6.3.tar.gz wget https://github.com/QEF/q-e/releases/download/qe-6.5/qe-6.5.tar.gz
``` ```
### GPU ### GPU
Available from the web site given above. You can use, for example, ```git clone``` Available from the web site given above. You can use, for example, ```wget```
to download the software: to download the software:
```bash ```bash
git clone https://github.com/fspiga/qe-gpu.git wget https://gitlab.com/QEF/q-e-gpu/-/archive/qe-gpu-6.5a1/q-e-gpu-qe-gpu-6.5a1.tar.gz
``` ```
## Compiling and installing the application ## Compiling and installing the application
...@@ -71,26 +69,36 @@ make; make install ...@@ -71,26 +69,36 @@ make; make install
``` ```
### GPU ### GPU
Check the __README.md__ file in the downloaded files since the The GPU version is configured similarly to the CPU version, the only exception being that the configure script
procedure varies from distribution to distribution. will check for the presence of PGI and CUDA libraries.
Most distributions do not have a ```configure``` command. Instead you copy a __make.inc__ A typical configure might be
file from the __install__ directory, and modify that directly before running make.
A number of templates are available in the distribution: ```bash
- make.inc_x86-64 ./configure --with-cuda=XX --with-cuda-runtime=YY --with-cuda-cc=ZZ --enable-openmp [ --with-scalapack=no ]
- make.inc_CRAY_PizDaint ```
- make.inc_POWER_DAVIDE where `XX` is the location of the CUDA Toolkit (in HPC environments is
- make.inc_POWER_SUMMITDEV generally `$CUDA_HOME`), `YY` is the version of the cuda toolkit and `ZZ`
is the compute capability of the card.
The second and third are particularly relevant in the PRACE infrastructure (ie. for CSCS For example,
PizDaint and CINECA DAVIDE).
Run __make__ to see the options available. For the UEABS you should select the ```bash
pw program (the only module currently available) ./configure --with-cuda=$CUDA_HOME --with-cuda-cc=60 --with-cuda-runtime=9.2
```
The __dev-tools/get_device_props.py__ script is available if you dont know these values.
Compilation is then performed as normal by
``` ```
make pw make pw
``` ```
#### Example compilation of Quantum Espresso for GPU based machines
The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`.
```bash
module load pgi cuda
./configure --with-cuda=$CUDA_HOME --with-cuda-cc=70 --with-cuda-runtime=10.2
make -j8 pw
```
## Running the program - general procedure ## Running the program - general procedure
...@@ -103,7 +111,7 @@ input files are of two types: ...@@ -103,7 +111,7 @@ input files are of two types:
The pseudopotential files are placed in a directory specified in the The pseudopotential files are placed in a directory specified in the
control file with the tag pseudo\_dir. Thus if we have control file with the tag pseudo\_dir. Thus if we have
```shell ```bash
pseudo_dir=./ pseudo_dir=./
``` ```
then QE-GPU will look for the pseudopotential then QE-GPU will look for the pseudopotential
...@@ -111,13 +119,13 @@ files in the current directory. ...@@ -111,13 +119,13 @@ files in the current directory.
If using the PRACE benchmark suite the data files can be If using the PRACE benchmark suite the data files can be
downloaded from the PRACE respository. For example, downloaded from the PRACE respository. For example,
```shell ```bash
wget https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz wget https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz
``` ```
Once uncompressed you can then run the program like this (e.g. using Once uncompressed you can then run the program like this (e.g. using
MPI over 16 cores): MPI over 16 cores):
```shell ```bash
mpirun -n 16 pw-gpu.x -input pw.in mpirun -n 16 pw-gpu.x -input pw.in
``` ```
...@@ -126,6 +134,22 @@ but check your system documentation since mpirun may be replaced by ...@@ -126,6 +134,22 @@ but check your system documentation since mpirun may be replaced by
allowed to run MPI programs interactively without using the allowed to run MPI programs interactively without using the
batch system. batch system.
### Running on GPUs
The procedure is identical to running on non accelerator-based hardware.
If GPUs are being used then the following will appear in the program output:
```
GPU acceleration is ACTIVE.
```
GPU acceleration can be switched off by setting the following environment variable:
```bash
$ export USEGPU=no
```
### Parallelisation options ### Parallelisation options
Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation
over the *k points* available in the input system. This is achieved with the ```-npool``` program option. over the *k points* available in the input system. This is achieved with the ```-npool``` program option.
...@@ -154,82 +178,68 @@ srun -n 64 pw.x -npool 2 -ndiag 4 -input pw.in ...@@ -154,82 +178,68 @@ srun -n 64 pw.x -npool 2 -ndiag 4 -input pw.in
### Hints for running the GPU version ### Hints for running the GPU version
#### Memory limitations
The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted
by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets. by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets.
For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS). For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS).
## Execution ## Execution
In the UEABS repository you will find a directory for each computer system tested, together with installation In the UEABS repository you will find a directory for each computer system tested, together with installation
instructions and job scripts. instructions and job scripts.
In the following we describe in detail the execution procedure for the Marconi computer system. In the following we describe in detail the execution procedure for the Marconi computer system.
### Execution on the Cineca Marconi KNL system ### Execution on the Cineca Galileo (x86) system
Quantum Espresso has already been installed for the KNL nodes of Quantum Espresso has already been installed on the cluster
Marconi and can be accessed via a specific module: and can be accessed via a specific module:
``` shell ``` bash
module load profile/knl module load profile/phys
module load autoload qe/6.0_knl module load autoload qe/6.5
``` ```
On Marconi the default is to use the MCDRAM as cache, and have the An example SLURM batch script is given below:
cache mode set as quadrant. Other settings for the KNLs on Marconi
haven't been substantailly tested for Quantum Espresso (e.g. flat
mode) but significant differences in performance for most inputs are
not expected.
An example SLURM batch script for the A2 partition is given below:
``` shell ``` bash
#!/bin/bash #!/bin/bash
#SBATCH -N2 #SBATCH --time=06:00:00 # Walltime in hh:mm:ss
#SBATCH --tasks-per-node=64 #SBATCH --nodes=4 # Number of nodes
#SBATCH -A <accountno> #SBATCH --ntasks-per-node=18 # Number of MPI ranks per node
#SBATCH -t 1:00:00 #SBATCH --cpus-per-task=2 # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem=118000 # Per nodes memory request (MB)
#SBATCH --account=<your account_no>
#SBATCH --job-name=jobname
#SBATCH --partition=gll_usr_prod
module purge module purge
module load profile/knl module load profile/phys
module load autoload qe/6.0_knl module load autoload qe/6.5
export OMP_NUM_THREADS=1 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MKL_NUM_THREADS=${OMP_NUM_THREADS} export MKL_NUM_THREADS=${OMP_NUM_THREADS}
srun pw.x -npool 2 -ndiag 16 -input file.in > file.out srun pw.x -npool 4 -input file.in > file.out
``` ```
In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in In the above with the SLURM directives we have asked for 4 nodes, 18 MPI tasks per node and 2 OpenMP threads
cache/quadrant mode and 93 Gb main memory each. We are running QE in MPI-only per task.
mode using 64 MPI processes/node with the k-points in 2 pools; the diagonalisation of the Hamiltonian
will be done by 16 (4x4) tasks.
Note that this script needs to be submitted using the KNL scheduler as follows: Note that this script needs to be submitted using SLURM scheduler as follows:
``` shell ``` bash
module load env-knl
sbatch myjob sbatch myjob
``` ```
Please check the Cineca documentation for information on using the
[Marconi KNL partition]
(https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture).
## UEABS test cases ## UEABS test cases
| UEABS name | QE name | Description | k-points | Notes| | UEABS name | QE name | Description | k-points | Notes|
|------------|---------------|-------------|----------|------| |------------|---------------|-------------|----------|------|
| Small test case | AUSURF | 112 atoms | 2 | < 4-8 nodes on most systems | | Small test case | AUSURF | 112 atoms | 2 | < 4-8 nodes on most systems |
| Large test case | TA2O5 | Tantalum oxide| 26| Medium scaling, often 20 nodes | | Large test case | GRIR443 | 432 | 4| Medium scaling, often 20 nodes |
| Very Large test case | CNT | Carbon nanotube | | Large scaling runs only. Memory and time very requirements high| | Very Large test case | CNT | Carbon nanotube | | Large scaling runs only. Memory and time requirements very high|
__Last updated: 29-April-2019__ __Last updated: 22-October-2020__