Commit 1a1f4085 authored by Jacob Finkenrath's avatar Jacob Finkenrath
Browse files

Update QPhiX compile instructions

parent 79c0b750
# README - QCD UEABS Part 2
**2017 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)**
**2021 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)**
Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ is developed in PRACE 4 IP under the task for developing an accelerator benchmark suite and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA
Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ was developed under the accelerator benchmark suite task within the 4th implementation phase of PRACE and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA
[^]: R. Babbich, M. Clark and B. Joo, “Parallelizing the QUDA Library for Multi-GPU Calculations
......@@ -9,7 +9,7 @@ and on the QPhiX library
[^]: B. Joo, D. D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. W. Lee, P. Dubey,
. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). The QPhiX library consists of routines which are optimize to use Intel intrinsic functions of multiple vector length including AVX512, including optimized routines for KNC and KNL (http://jeffersonlab.github.io/qphix/). The benchmark kernel consists of the provided Conjugated Gradient benchmark functions of the libraries.
. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). Currently a HIP and a generic version is under development, which can be used for AMD GPUs and if ready for CPU architectures, such as ARM. The generic QUDA kernel might replace the computational kernel of QPhiX in the future. The QPhiX library consists of routines developed for Intel Xeon Phi architecture and can perform on x86 architecture. QPhiX are optimize to use Intel intrinsic functions of multiple vector length including AVX512 (http://jeffersonlab.github.io/qphix/). In general the benchmark kernels are applying the Conjugated Gradient solver to the Wilson Dirac operator, a 4-dimension stencil.
## Table of Contents
......@@ -21,45 +21,27 @@ and on the QPhiX library
#### 1.1 Compile
Download Cmake and Quda
Clone `quda` via
General information how to build QUDA with cmake can be found under:
"https://github.com/lattice/quda/wiki/Building-QUDA-with-cmake"
Here we just give a short overview:
Build Cmake: (./QCD_Accelerator_Benchmarksuite_Part2/GPUs/src/cmake-3.7.0.tar.gz)
Cmake can be downloaded from the source with the URL: https://cmake.org/download/
In this guide the version cmake-3.7.0 is used. The build instruction can be found in the main directory under README.rst . Use the configure file `./configure` .
Then run
gmake`.
Build Quda: (./QCD_Accelerator_Benchmarksuite_Part2/GPUs/src/quda.tar.gz)
```shell
git clone https://github.com/lattice/quda.git
```
Download quda for example by using `git clone https://github.com/lattice/quda.git`.
Create a build-folder. Execute the executable `cmake` in the build-folder which
is located in the cmake/bin.
Execute:
and build `quda` via
``` shell
./$PATH2CMAKE/cmake $PATH2QUDA -DQUDA_GPU_ARCH=sm_XX -DQUDA_DIRAC_WILSON=ON -DQUDA_DIRAC_TWISTED_MASS=OFF
-DQUDA_DIRACR_DOMAIN_WALL=OFF -DQUDA_HISQ_LINK=OFF -DQUDA_GAUGE_FORCE=OFF -DQUDA_HISQ_FORCE=OFF -DQUDA_MPI=ON
```
cd quda
mkdir build
cd build
with
``` shell
PATH2CMAKE= path to the cmake-executable
PAT2QUDA= path to the home dir of QUDA
```
Set `-DQUDA_GPU_ARCH=sm_XX` to the GPU Architecture (`sm_60` for Pascals, `sm_35` for Keplers)
If Cmake or the compilation fails library paths and options can be set by the cmake provided function "ccmake".
Use `./PATH2CMAKE/ccmake PATH2BUILD_DIR` to edit and to see the availble options.
If CMake or the compilation fails, library paths and options can be set by the cmake provided function "ccmake".
Use `./PATH2CMAKE/ccmake PATH2BUILD_DIR` to edit and to see the available options.
Cmake generates the Makefiles. Run them by use `make`.
Now in the folder /test one can find the needed Quda executable "invert_".
Now in the folder /test one can find the needed ` executable "invert_".
#### 1.2 Run
......@@ -216,37 +198,36 @@ GPUs GFLOPS sec
## Xeon(Phi) Kernel
### 2. Compile and Run the Xeon(Phi)-Part
Unpack the provided source tar-file located in `./QCD_Accelerator_Benchmarksuite_Part2/XeonPhi/src` or
clone the actual git-hub branches of the code
packages QMP:
QPhiX currently requires additional third party libraries, like the USQCD-libraries `qmp`,`qdpxx`,`qio`,`xpath_reader`,`c-lime` and the xml library `libxml2`. The USQCD-libs can be found under
``` shell
git clone https://github.com/usqcd-software/qmp
```shell
https://github.com/usqcd-software
```
and for `libxml2`
```shell
http://xmlsoft.org/
```
and for QPhix
while the repository of QPhiX is hosted under Jefferson Lab.
``` shell
git clone https://github.com/JeffersonLab/qphix
```
Note that for running on Skylake chips it is recommended to utilize
the branch develop of QPhix which needs additional packages
like qdp++ (Status 04/2019).
#### 2.1 Compile
The QPhix library is based on QMP communication functions.
For that QMP has to be setup first.
The QPhiX library is based on QMP communication functions.
For that QMP has to be setup first. Note that you might have to reconfigure the configure-file using `autoreconf` from `Autotool`. QMP can be configured using:
``` shell
./configure --prefix=$QMP_INSTALL_DIR CC=mpiicc CFLAGS=" -mmic/-xAVX512 -std=c99" --with-qmp-comms-type=MPI --host=x86_64-linux-gnu --build=none-none-none
./configure --prefix=$QMP_INSTALL_DIR CC=mpiicc CFLAGS=" -xHOST -std=c99" --with-qmp-comms-type=MPI --host=x86_64-linux-gnu --build=none-none-none
```
Create the Install folder and link with `$QMP_INSTALL_DIR` to it.
Use the compilerflag `-mmic` for the compilation for KNC's
while use `-xAVX512` for the compilation for KNL's.
Create the install folder and link with `$QMP_INSTALL_DIR` to it.
Then use
``` shell
make
make install
......@@ -254,47 +235,136 @@ make install
to compile and setup the necessary source files in `$QMP_INSTALL_DIR`.
The QPhix executable can be compiled by using:
for KNC's
For the current master branch of `QPhiX` it is required to provide the package `qdp++`, which has sub-dependencies given by ``qio`,`xpath_reader`,`c-lime` and `libxml2`. QDP++ can be configure using (here for Skylake chip)
``` shell
./configure --enable-parallel-arch=parscalar --enable-proc=MIC --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-openmp -mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -g -O2 -finline-functions -fno-alias -std=c++0x" CFLAGS="-mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -openmp -g -O2 -fno-alias -std=c9l9" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR
./configure --with-qmp=$QMP_INSTALL_DIR --enable-parallel-arch=parscalar CC=mpiicc CFLAGS="-xCORE-AVX512 -mtune=skylake-avx512 -std=c99" CXX=mpiicpc CXXFLAGS="-axCORE-AVX512 -mtune=skylake-avx512 -std=c++14 -qopenmp" --enable-openmp --host=x86_64-linux-gnu --build=none-none-none --prefix=$QDPXX_INSTALL_DIR l --disable-filedb
```
or for KNL's
The configure is searching for additional USQCD libraries in the subfolder `./other_libs`. Clone the required library files, like
``` shell
./configure --enable-parallel-arch=parscalar --enable-proc=AVX512 --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-qopenmp -xMIC-AVX512 -g -O3 -std=c++14" CFLAGS="-xMIC-AVX512 -qopenmp -O3 -std=c99" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR
```shell
git clone https://github.com/usqcd-software/qio.git
```
by using the previous variable `QMP_INSTALL_DIR` which links to the install-folder
of QMP. The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`.
and reconfigure. Note that on system like JSC's JUWELS `libxml2` has to be additional compiled and the path can be added to the configuration of `qdpxx` via
```shell
--with-libxml2=$LIBXML2_INSTALL_DIR
```
Now QPhiX benchmark kernels can be compiled via `cmake`. Create a build folder and
Note for the develop branch the package qdp++ has to be compiled.
QDP++ can be configure using (here for skylake chip)
``` shell
./configure --with-qmp=$QMP_INSTALL_DIR --enable-parallel-arch=parscalar CC=mpiicc CFLAGS="-xCORE-AVX512 -mtune=skylake-avx512 -std=c99" CXX=mpiicpc CXXFLAGS="-axCORE-AVX512 -mtune=skylake-avx512 -std=c++14 -qopenmp" --enable-openmp --host=x86_64-linux-gnu --build=none-none-none --prefix=$QDPXX_INSTALL_DIR
mkdir build
cd build
cmake -DQDPXX_DIR=$QDP_INSTALL_DIR -DQMP_DIR=$QMP_INSTALL_DIR -Disa=avx512 -Dparallel_arch=parscalar -Dhost_cxx=mpiicpc -Dhost_cxxflags="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -Dtm_clover=ON -Dtwisted_mass=ON -Dtesting=ON -DCMAKE_CXX_COMPILER=mpiicpc -DCMAKE_CXX_FLAGS="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -DCMAKE_C_COMPILER=mpiicc -DCMAKE_C_FLAGS="-std=c99 -O3 -axCORE-AVX512 -mtune=skylake-avx512" ..
```
Now QPhix executable can be compiled by using:
The executable `time_clov_noqdp` can be found now in the sub-folder `tests`. Note that in the current version compilation of other test kernels can fail. If that is the case directly compile the needed executable, via
```
cd tests
make time_clov_noqdp
```
The QPhiX is developed to utilize the computational potential of Intels Xeon Phi architecture, which is discontinued. Earlier versions of QPhiX for the Intel Xeon Phi architecture can be compiled without `qdpxx` using configure build. Namely for KNC's
``` shell
cmake -DQDPXX_DIR=$QDP_INSTALL_DIR -DQMP_DIR=$QMP_INSTALL_DIR -Disa=avx512 -Dparallel_arch=parscalar -Dhost_cxx=mpiicpc -Dhost_cxxflags="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -Dtm_clover=ON -Dtwisted_mass=ON -Dtesting=ON -DCMAKE_CXX_COMPILER=mpiicpc -DCMAKE_CXX_FLAGS="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -DCMAKE_C_COMPILER=mpiicc -DCMAKE_C_FLAGS="-std=c99 -O3 -axCORE-AVX512 -mtune=skylake-avx512" ..
./configure --enable-parallel-arch=parscalar --enable-proc=MIC --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-openmp -mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -g -O2 -finline-functions -fno-alias -std=c++0x" CFLAGS="-mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -openmp -g -O2 -fno-alias -std=c9l9" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR
```
The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`.
or for KNL's
``` shell
./configure --enable-parallel-arch=parscalar --enable-proc=AVX512 --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-qopenmp -xMIC-AVX512 -g -O3 -std=c++14" CFLAGS="-xMIC-AVX512 -qopenmp -O3 -std=c99" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR
```
by using the previous variable `QMP_INSTALL_DIR` which links to the install-folder
of QMP. The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`.
##### 2.1.1 Example compilation on PRACE machines
In the subsection we provide some example compilation on PRACE machines
which where used to develop the QCD Benchmarksuite 2.
###### 2.1.1.1 BSC - Marenostrum III Hybrid partitions
###### 2.1.1.1 JSC - JUWELS
JUWELS (Cluster Module) at Juelich Supercomputing Center is equipped with Intel Skylake chips, namely 2× Intel Xeon Platinum 8168 CPU, 2× 24 cores, 2.7 GHz per compute node. The compilation was done using the following software setup (status 07/21)
```shell
ml Intel/2020.2.254-GCC-9.3.0 IntelMPI/2019.8.254 imkl/2020.4.304 Autotools CMake
```
(`ml` is a short cut for `module load`).
`qmp` was build via
```
git clone https://github.com/usqcd-software/qmp.git
cd qmp
autoreconf
./configure --prefix=${PWD}/install CC=mpiicc CFLAGS="-xHOST -std=c99 -O3" --with-qmp-comms-type=MPI --host=x86_64-linux-gnu
make
make install
```
`qdpxx` requires `libxml2` which was build via
```shell
git clone https://gitlab.gnome.org/GNOME/libxml2.git
cd libxml2
./autogen.sh
./configure --prefix=${PWD}/install
make
make install
```
For `qdpxx` the additional required libs were added to the sub-folder `other-libs`, by
```shell
git clone https://github.com/usqcd-software/qdpxx.git
cd qdpxx/other_libs
git clone https://github.com/usqcd-software/xpath_reader.git
cd xpath_reader/
autoreconf
cd ..
git clone https://github.com/usqcd-software/qio.git
cd qio
autoreconf
cd /other_libs/
git clone https://github.com/usqcd-software/c-lime.git
# note that the path of c-lime is ./qdpxx/other_libs/qio/other_libs/c-lime
cd c-lime/
autoreconf
```
Now, `qpdxx` was compiled via
The Hybrid partition on Marenostrum are equiped with KNC's.
```shell
./configure --with-libxml2=${PWD}/../libxml2/install --with-qmp=/p/project/cecy00/ecy00a/src_bench/qdpxx/../qmp/install --enable-openmp --enable-parallel-arch=parscalar CC=mpiicc CFLAGS="-xHOST -std=c99 -qopenmp" CXX=mpiicpc CXXFLAGS="-xHOST -std=c++11 -qopenmp" --prefix=/p/project/cecy00/ecy00a/src_bench/qdpxx/install --with-libxml2=${PWD}/../libxml2/install --disable-filedb
make
make install
```
Note `configure` might require `xml2-config` for which the path can be set via `export PATH+=:$PATH_2_LIBXML2`
Finally the computational kernels of `QPhiX` can be build via
```shell
git clone https://github.com/JeffersonLab/qphix.git
cd qphix/
mkdir build
cd build
cmake -DQDPXX_DIR=${PWD}/../../qdpxx/install -DQMP_DIR=${PWD}/../../qmp/install -Disa=avx512 -Dparallel_arch=parscalar -Dhost_cxx=icpc -Dhost_cxxflags="-std=c++17 -O3 -xHOST" -Dtm_clover=ON -Dtwisted_mass=ON -Dtesting=ON -DCMAKE_CXX_COMPILER=mpiicpc -DCMAKE_CXX_FLAGS="-std=c++17 -O3 -xHOST" -DCMAKE_C_COMPILER=mpiicc -DCMAKE_C_FLAGS="-std=c99 -O3 -xHOST" -DCMAKE_INSTALL_PREFIX=${PWD}/../install ..
cd tests
make time_clov_noqdp
```
###### 2.1.1.2 BSC - Marenostrum III Hybrid partitions
The Hybrid partition on Marenostrum are equiped with KNC's (status 2016).
First following modules were loaded
``` shell
......@@ -326,9 +396,9 @@ Now the package QPhix is compilled with
make
```
###### 2.1.1.2 CINES - Frioul
###### 2.1.1.3 CINES - Frioul
On a test cluster at the CINES-side the Benchmarksuite was tested on KNL's.
On a test cluster at the CINES-side the Benchmarksuite was tested on KNL's (status 2018).
The steps are similar to BSC. First the libraries paths are set with
``` shell
......@@ -465,4 +535,4 @@ KNCs GFLOPS
16 368.196
32 605.882
64 847.566
```
```
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment