Commit b1dd1f00 authored by Jacob Finkenrath's avatar Jacob Finkenrath
Browse files

Renaming and stylistic changes to the readme

parent 50dd3415
......@@ -238,10 +238,10 @@ Accelerator-based implementations have been implemented for EXDIG, using off-loa
| **General information** | **Scientific field** | **Language** | **MPI** | **OpenMP** | **GPU** | **LoC** | **Code description** |
|------------------|----------------------|--------------|---------|------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_1) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_1/README) | lattice QuantumChromodynamics - Part 1 | C | yes | yes | yes (CUDA) | -- | Accelerator enabled kernel E of UEABS QCD CPU part using targetDP model. Test case A - 8x64x64x64. Conjugate Gradient solver involving Wilson Dirac stencil. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. |
| <br>[- Source](https://lattice.github.io/quda/) <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_2) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_2/README) | lattice QuantumChromodynamics - Part 2 - QUDA | C++ | yes | yes | yes (CUDA) | -- | Part 2: GPU is using a QUDA kernel for running on NVIDIA GPUs. [Test case A - 96x32x32x32] Small problem size. CG solver. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. [Test case B - 126x64x64x64] Moderate problem size. CG solver on Wilson Dirac stencil. Bandwidth bounded |
| <br>[- Source](http://jeffersonlab.github.io/qphix/) <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_2) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_2/README) | lattice QuantumChromodynamics - Part 2 - QPHIX | C++ | yes | yes | no | -- | Part 2: Xeon is using a QPhiX kernel which is optimize to run on x86, in particular Intel Xeon (Phi). [Test case A - 96x32x32x32] Small problem size. CG solver involving Wilson Dirac stencil. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. [Test case B - 126x64x64x64] Moderate problem size. CG solver on Wilson Dirac stencil. Bandwidth bounded |
| <br>[- Source](https://repository.prace-ri.eu/ueabs/QCD/1.3/QCD_Source_TestCaseA.tar.gz) <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_cpu) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_cpu/README) | lattice QuantumChromodynamics - CPU Part - legacy UEABS | C/Fortran | yes | yes/no | No | -- | CPU part based on UEABS QCD CPU part (legacy) benchmark kernels (last update 2017). Based on 5 different Benchmark applications representative for the European Lattice QCD community (see doc for more details). |
| <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_1) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_1/README) | lattice Quantum Chromodynamics Part 1 | C | yes | yes | yes (CUDA) | -- | Accelerator enabled kernel E of UEABS QCD CPU part using targetDP model. Test case A - 8x64x64x64. Conjugate Gradient solver involving Wilson Dirac stencil. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. |
| <br>[- Source](https://lattice.github.io/quda/) <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_2) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_2/README) | lattice Quantum Chromodynamics Part 2 - QUDA | C++ | yes | yes | yes (CUDA) | -- | Part 2: GPU is using a QUDA kernel for running on NVIDIA GPUs. [Test case A - 96x32x32x32] Small problem size. CG solver. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. [Test case B - 126x64x64x64] Moderate problem size. CG solver on Wilson Dirac stencil. Bandwidth bounded |
| <br>[- Source](http://jeffersonlab.github.io/qphix/) <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_2) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_2/README) | lattice Quantum Chromodynamics Part 2 - QPHIX | C++ | yes | yes | no | -- | Part 2: Xeon(Phi) is using a QPhiX kernel which is optimize to run on x86, in particular Intel Xeon (Phi). [Test case A - 96x32x32x32] Small problem size. CG solver involving Wilson Dirac stencil. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. [Test case B - 126x64x64x64] Moderate problem size. CG solver on Wilson Dirac stencil. Bandwidth bounded |
| <br>[- Source](https://repository.prace-ri.eu/ueabs/QCD/1.3/QCD_Source_TestCaseA.tar.gz) <br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_cpu) <br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_cpu/README) | lattice Quantum Chromodynamics - CPU Part - legacy UEABS | C/Fortran | yes | yes/no | No | -- | CPU part based on UEABS QCD CPU part (legacy) benchmark kernels (last update 2017). Based on 5 different Benchmark applications representative for the European Lattice QCD community (see doc for more details). |
# Quantum Espresso <a name="espresso"></a>
......
......@@ -8,7 +8,7 @@ The QCD Accelerator Benchmark suite Part 1 is a direct port of "QCD kernel E" fr
## Part 2:
The QCD Accelerator Benchmark suite Part 2 consists of two kernels, the QUDA and the QPhiX library. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). The QPhiX library consists of routines which are optimize to use INTEL intrinsic functions of multiple vector length, including optimized routines for KNC and KNL (http://jeffersonlab.github.io/qphix/). The benchmark kernels are using the provided Conjugated Gradient benchmark functions of the libraries.
The QCD Accelerator Benchmark suite Part 2 consists of two kernels, the QUDA and the QPhiX library. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). The QPhiX library consists of routines which are optimize to use Intel intrinsic functions of multiple vector length, including optimized routines for KNC and KNL (http://jeffersonlab.github.io/qphix/). The benchmark kernels are using the provided Conjugated Gradient benchmark functions of the libraries.
## Part CPU:
......
PRACE QCD Accelerator Benchmark 1
Part 1: UEABS QCD
=================================
This benchmark is part of the QCD section of the Accelerator Benchmarks Suite developed as part of a PRACE EU funded project
(http://www.prace-ri.eu).
The suite is derived from the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/
(http://www.prace-ri.eu) and since PRACE 5IP integrated in the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ and named UEABS QCD part 1.
This specific component is a direct port of "QCD kernel E" from the UEABS, which is based on the MILC code suite (http://www.physics.utah.edu/~detar/milc/). The performance-portable targetDP model has been used to allow the benchmark to utilise NVIDIA GPUs, Intel Xeon Phi manycore CPUs and traditional multi-core CPUs. The use of MPI (in conjunction with targetDP) allows multiple nodes to be used in parallel.
......
# README - QCD UEABS Part 2
**2017 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)**
The QCD Accelerator Benchmark suite Part 2 consists of two kernels, the QUDA
Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ is developed in PRACE 4 IP under the task for developing an accelerator benchmark suite and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA
[^]: R. Babbich, M. Clark and B. Joo, “Parallelizing the QUDA Library for Multi-GPU Calculations
and the QPhiX library
and on the QPhiX library
[^]: B. Joo, D. D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. W. Lee, P. Dubey,
......@@ -213,8 +213,8 @@ GPUs GFLOPS sec
64 2645.590000 1.480000
```
## x86 Kernel
### 2. Compile and Run the x86-Part
## Xeon(Phi) Kernel
### 2. Compile and Run the Xeon(Phi)-Part
Unpack the provided source tar-file located in `./QCD_Accelerator_Benchmarksuite_Part2/XeonPhi/src` or
clone the actual git-hub branches of the code
......
......@@ -144,7 +144,7 @@ GPUs GFLOPS sec
32 1842.560000 1.550000
64 2645.590000 1.480000
### XEONPHI - BENCHMARK SUITE
### Xeon(Phi) - BENCHMARK SUITE
The benchmark results for the XeonPhi benchmark suite are performed on Frioul at CINES, and the hybrid partition on MareNostrum III at BSC. Frioul has one KNL-card per node while the hybrid partition of MareNostrum III is equipped with two KNCs per node. The data on Frioul are generated by using the bash-scripts provided by the second implementation of QCD and are done for the two test cases "strong-scaling" with a lattice size of 32x32x32x96 and 64x64x64x128. In case of the data generated at MareNostrum, data for the "strong-scaling" mode on a 32x32x32x96 lattice are shown. The benchmark kernel uses a random gauge configuration and the conjugated gradient solver to solve a linear equation involving the clover Wilson Dirac operator.
MareNostrum_KNC.png
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment