# Quantum Espresso in the Accelerated Benchmark Suite ## Document Author: A. Emerson (a.emerson@cineca.it) , Cineca. ## Contents 1. Introduction 2. Requirements 3. Downloading the software 4. Compiling the application 5. Running the program 6. Example 7. References ## 1. Introduction The GPU port of Quantum Espresso is a version of the program which has been completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these experiments is v6.0, even though further versions becamse available later during the activity. ## 2. Build Requirements For complete build requirements and information see the following GitHub site: [QE-GPU](https://github.com/fspiga/qe-gpu) A short summary is given below: Essential * The PGI compiler version 17.4 or above. * You need NVIDIA TESLA GPUS such as Kepler (K20, K40, K80) or Pascal (P100) or Volta (V100). No other cards are supported. NVIDIA TESLA P100 and V100 are strongly recommend for their on-board memory capacity and double precision performance. Optional * A parallel linear algebra library such as Scalapack, Intel MKL or IBM ESSL. If none is available on your system then the installation can use a version supplied with the distribution. ## 3. Downloading the software Available from the web site given above. You can use, for example, ``git clone`` to download the software: ```bash git clone https://github.com/fspiga/qe-gpu.git ``` ## 4. Compiling and installing the application Check the __README.md__ file in the downloaded files since the procedure varies from distribution to distribution. Most distributions do not have a ```configure``` command. Instead you copy a __make.inc__ file from the __install__ directory, and modify that directly before running make. A number of templates are available in the distribution: - make.inc_x86-64 - make.inc_CRAY_PizDaint - make.inc_POWER_DAVIDE - make.inc_POWER_SUMMITDEV The second and third are particularly relevant in the PRACE infrastructure (ie. for CSCS PizDaint and CINECA DAVIDE). Run __make__ to see the options available. For the UEABS you should select the pw program (the only module currently available) ``` make pw ``` The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`. ## 5. Running the program Of course you need some input before you can run calculations. The input files are of two types: 1. A control file usually called `pw.in` 2. One or more pseudopotential files with extension `.UPF` The pseudopotential files are placed in a directory specified in the control file with the tag pseudo\_dir. Thus if we have ```shell pseudo_dir=./ ``` then QE-GPU will look for the pseudopotential files in the current directory. If using the PRACE benchmark suite the data files can be downloaded from the QE website or the PRACE respository. For example, ```shell wget http://www.prace-ri.eu/UEABS/Quantum\_Espresso/QuantumEspresso_TestCaseA.tar.gz ``` Once uncompressed you can then run the program like this (e.g. using MPI over 16 cores): ```shell mpirun -n 16 pw-gpu.x -input pw.in ``` but check your system documentation since mpirun may be replaced by `mpiexec, runjob, aprun, srun,` etc. Note also that normally you are not allowed to run MPI programs interactively but must instead use the batch system. A couple of examples for PRACE systems are given in the next section. ## 6. Examples We now give a build and 2 run examples. ### Computer System: DAVIDE P100 cluster, cineca #### Running Quantum Espresso has already been installed for the KNL nodes of Marconi and can be accessed via a specific module: ``` shell module load profile/knl module load autoload qe/6.0_knl ``` On Marconi the default is to use the MCDRAM as cache, and have the cache mode set as quadrant. Other settings for the KNLs on Marconi haven't been substantailly tested for Quantum Espresso (e.g. flat mode) but significant differences in performance for most inputs are not expected. An example PBS batch script for the A2 partition is given below: ``` shell #!/bin/bash #PBS -l walltime=06:00:00 #PBS -l select=2:mpiprocs=34:ncpus=68:mem=93gb #PBS -A #PBS -N jobname module purge module load profile/knl module load autoload qe/6.0_knl cd ${PBS_O_WORKDIR} export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=${OMP_NUM_THREADS} mpirun pw.x -npool 4 -input file.in > file.out ``` In the above with the PBS directives we have asked for 2 KNL nodes (each with 68 cores) in cache/quadrant mode and 93 Gb main memory each. We are running QE in hybrid mode using 34 MPI processes/node, each with 4 OpenMP threads/process and distributing the k-points in 4 pools; the Intel MKl library will also use 4 OpenMP threads/process. Note that this script needs to be submitted using the KNL scheduler as follows: ``` shell module load env-knl qsub myjob ``` Please check the Cineca documentation for information on using the [Marconi KNL partition] (https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture). ## 7. References 1. QE-GPU build and download instructions, https://github.com/QEF/qe-gpu-plugin. Last updated: 7-April-2017