**2017 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)**
**2021 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)**
Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ is developed in PRACE 4 IP under the task for developing an accelerator benchmark suite and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA
Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ was developed under the accelerator benchmark suite task within the 4th implementation phase of PRACE and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA
[^]:R. Babbich, M. Clark and B. Joo, “Parallelizing the QUDA Library for Multi-GPU Calculations
...
...
@@ -9,7 +9,7 @@ and on the QPhiX library
[^]:B. Joo, D. D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. W. Lee, P. Dubey,
. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). The QPhiX library consists of routines which are optimize to use Intel intrinsic functions of multiple vector length including AVX512, including optimized routines for KNC and KNL (http://jeffersonlab.github.io/qphix/). The benchmark kernel consists of the provided Conjugated Gradient benchmark functions of the libraries.
. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). Currently a HIP and a generic version is under development, which can be used for AMD GPUs and if ready for CPU architectures, such as ARM. The generic QUDA kernel might replace the computational kernel of QPhiX in the future. The QPhiX library consists of routines developed for Intel Xeon Phi architecture and can perform on x86 architecture. QPhiX are optimize to use Intel intrinsic functions of multiple vector length including AVX512 (http://jeffersonlab.github.io/qphix/). In general the benchmark kernels are applying the Conjugated Gradient solver to the Wilson Dirac operator, a 4-dimension stencil.
## Table of Contents
...
...
@@ -21,45 +21,27 @@ and on the QPhiX library
#### 1.1 Compile
Download Cmake and Quda
Clone `quda` via
General information how to build QUDA with cmake can be found under:
Cmake can be downloaded from the source with the URL: https://cmake.org/download/
In this guide the version cmake-3.7.0 is used. The build instruction can be found in the main directory under README.rst . Use the configure file `./configure` .
Set `-DQUDA_GPU_ARCH=sm_XX` to the GPU Architecture (`sm_60` for Pascals, `sm_35` for Keplers)
If Cmake or the compilation fails library paths and options can be set by the cmake provided function "ccmake".
Use `./PATH2CMAKE/ccmake PATH2BUILD_DIR` to edit and to see the availble options.
If CMake or the compilation fails, library paths and options can be set by the cmake provided function "ccmake".
Use `./PATH2CMAKE/ccmake PATH2BUILD_DIR` to edit and to see the available options.
Cmake generates the Makefiles. Run them by use `make`.
Now in the folder /test one can find the needed Quda executable "invert_".
Now in the folder /test one can find the needed ` executable "invert_".
#### 1.2 Run
...
...
@@ -216,37 +198,36 @@ GPUs GFLOPS sec
## Xeon(Phi) Kernel
### 2. Compile and Run the Xeon(Phi)-Part
Unpack the provided source tar-file located in `./QCD_Accelerator_Benchmarksuite_Part2/XeonPhi/src` or
clone the actual git-hub branches of the code
packages QMP:
QPhiX currently requires additional third party libraries, like the USQCD-libraries `qmp`,`qdpxx`,`qio`,`xpath_reader`,`c-lime` and the xml library `libxml2`. The USQCD-libs can be found under
``` shell
git clone https://github.com/usqcd-software/qmp
```shell
https://github.com/usqcd-software
```
and for `libxml2`
```shell
http://xmlsoft.org/
```
and for QPhix
while the repository of QPhiX is hosted under Jefferson Lab.
``` shell
git clone https://github.com/JeffersonLab/qphix
```
Note that for running on Skylake chips it is recommended to utilize
the branch develop of QPhix which needs additional packages
like qdp++ (Status 04/2019).
#### 2.1 Compile
The QPhix library is based on QMP communication functions.
For that QMP has to be setup first.
The QPhiX library is based on QMP communication functions.
For that QMP has to be setup first. Note that you might have to reconfigure the configure-file using `autoreconf` from `Autotool`. QMP can be configured using:
Create the Install folder and link with `$QMP_INSTALL_DIR` to it.
Use the compilerflag `-mmic` for the compilation for KNC's
while use `-xAVX512` for the compilation for KNL's.
Create the install folder and link with `$QMP_INSTALL_DIR` to it.
Then use
``` shell
make
make install
...
...
@@ -254,47 +235,136 @@ make install
to compile and setup the necessary source files in `$QMP_INSTALL_DIR`.
The QPhix executable can be compiled by using:
for KNC's
For the current master branch of `QPhiX` it is required to provide the package `qdp++`, which has sub-dependencies given by ``qio`,`xpath_reader`,`c-lime` and `libxml2`. QDP++ can be configure using (here for Skylake chip)
by using the previous variable `QMP_INSTALL_DIR` which links to the install-folder
of QMP. The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`.
and reconfigure. Note that on system like JSC's JUWELS `libxml2` has to be additional compiled and the path can be added to the configuration of `qdpxx` via
```shell
--with-libxml2=$LIBXML2_INSTALL_DIR
```
Now QPhiX benchmark kernels can be compiled via `cmake`. Create a build folder and
Note for the develop branch the package qdp++ has to be compiled.
QDP++ can be configure using (here for skylake chip)
The executable `time_clov_noqdp` can be found now in the sub-folder `tests`. Note that in the current version compilation of other test kernels can fail. If that is the case directly compile the needed executable, via
```
cd tests
make time_clov_noqdp
```
The QPhiX is developed to utilize the computational potential of Intels Xeon Phi architecture, which is discontinued. Earlier versions of QPhiX for the Intel Xeon Phi architecture can be compiled without `qdpxx` using configure build. Namely for KNC's
by using the previous variable `QMP_INSTALL_DIR` which links to the install-folder
of QMP. The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`.
##### 2.1.1 Example compilation on PRACE machines
In the subsection we provide some example compilation on PRACE machines
which where used to develop the QCD Benchmarksuite 2.
###### 2.1.1.1 BSC - Marenostrum III Hybrid partitions
###### 2.1.1.1 JSC - JUWELS
JUWELS (Cluster Module) at Juelich Supercomputing Center is equipped with Intel Skylake chips, namely 2× Intel Xeon Platinum 8168 CPU, 2× 24 cores, 2.7 GHz per compute node. The compilation was done using the following software setup (status 07/21)
```shell
ml Intel/2020.2.254-GCC-9.3.0 IntelMPI/2019.8.254 imkl/2020.4.304 Autotools CMake