# Unified European Applications Benchmark Suite The Unified European Application Benchmark Suite (UEABS) is a set of currently 13 application codes taken from the pre-existing PRACE and DEISA application benchmark suites, and extended with the PRACE Accelerator Benchmark Suite. The objective is providing a single benchmark suite of scalable, currently relevant and publicly available application codes and datasets, of a size which can realistically be run on large systems, and maintained into the future. The UEABS activity was started during the PRACE-PP project and was publicly released by the PRACE-2IP project. The PRACE "Accelerator Benchmark Suite" was a PRACE-4IP activity. The UEABS has been and will be actively updated and maintained by the subsequent PRACE-IP projects. Each application code has either one, or two input datasets. If there are two datasets, Test Case A is designed to run on Tier-1 sized systems (up to around 1,000 x86 cores, or equivalent) and Test Case B is designed to run on Tier-0 sized systems (up to around 10,000 x86 cores, or equivalent). If there is only one dataset (Test Case A), it is suitable for both sizes of system. Contacts: (Ok to mention all BCOs here?, ask PMO for a UEABS contact mailing list address?), Walter Lioen Current Release --------------- The current release is Version 2.2 (December 31, 2021). See also the [release notes and history](RELEASES.md). Running the suite ----------------- Instructions to run each test cases of each codes can be found in the subdirectories of this repository. For more details of the codes and datasets, and sample results, please see the PRACE-6IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021) at http://www.prace-ri.eu/public-deliverables/ . The application codes that constitute the UEABS are: ---------------------------------------------------
Application Lines of
Code
Parallelism Language Code Description/Notes
MPI OpenMP/
Pthreads
GPU Fortran Python C C++
Alya 600,000 X X X X The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain).
Code_Saturne ~350,000 X X X X X X X The code solves the Navier-Stokes equations for incompressible/compressible flows using a predictor-corrector technique. The Poisson pressure equation is solved by a Conjugate Gradient preconditioned by a multi-grid algorithm, and the transport equations by Conjugate Gradient-like methods. Advanced gradient reconstruction is also available to account for distorted meshes.
CP2K ~1,150,000 X X X X CP2K is a freely available quantum chemistry and solid-state physics software package for performing atomistic simulations. It can be run with MPI, OpenMP and CUDA. All of CP2K is MPI parallelised, with some routines making use of OpenMP, which can be used to reduce the memory footprint. In addition some linear algebra operations may be offloaded to GPUs using CUDA.
GADGET
GPAW 132,000 X (X) X X GPAW is a density-functional theory (DFT) program for ab initio electronic structure calculations using the projector augmented wave method. It uses a uniform real-space grid representation of the electronic wavefunctions that allows for excellent computational scalability and systematic converge properties. The GPAW benchmark tests MPI parallelization and the quality of the provided mathematical libraries, including BLAS, LAPACK, ScaLAPACK, and FFTW-compatible library. There is also an experimental CUDA-based implementation for GPU systems, but it is not covered by this UEABS release.
GROMACS 862,079 X X X X X GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
NAMD 887,547 X X X X NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of compute platforms.
NEMO 154,240 X X X NEMO (Nucleus for European Modelling of the Ocean) is a mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by a European consortium. It is intended to be a tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).
PFARM 21,434 X X X X PFARM uses an R-matrix ab-initio approach to calculate electron-atom and electron-molecule collisions data for a wide range of applications including atrophysics and nuclear fusion. It is written in modern Fortran/MPI/OpenMP and exploits highly-optimised dense linear algebra numerical library routines.
QCD
Quantum ESPRESSO 92,996 X X X X Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. It is written in MPI and OpenMP with a CUDA Fortran version available for Nvidia GPUs. In the benchmark suite we consider only the most used program, PWscf.
SPECFEM3D ~120,000 (100k Fortran & 20k C) X X X X X The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM).
TensorFlow ~3,000,000 X X X X X X TensorFlow is a popular open-source library for symbolic math and linear algebra, with particular optimisation for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry.
_**TODO for all BCOs: move all information below this line either in the Table above (using a short version in "Code Description/Notes") or to your top-level README / description. (And when done, remove the relevant text below.)**_ - [ALYA](#alya) - [Code_Saturne](#saturne) - [CP2K](#cp2k) - [GADGET](#gadget) - [GPAW](#gpaw) - [GROMACS](#gromacs) - [NAMD](#namd) - [PFARM](#pfarm) - [QCD](#qcd) - [SPECFEM3D](#specfem3d) # GADGET GADGET-4 (GAlaxies with Dark matter and Gas intEracT), an evolved and improved version of GADGET-3, is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written mainly by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany, nd benefiting from numerous contributions, including Ruediger Pakmor, Oliver Zier, and Martin Reinecke. GADGET-4 supports collisionless simulations and smoothed particle hydrodynamics on massively parallel computers. All communication between concurrent execution processes is done either explicitly by means of the message passing interface (MPI), or implicitly through shared-memory accesses on processes on multi-core nodes. The code is mostly written in ISO C++ (assuming the C++11 standard), and should run on all parallel platforms that support at least MPI-3. So far, the compatibility of the code with current Linux/UNIX-based platforms has been confirmed on a large number of systems. The code can be used for plain Newtonian dynamics, or for cosmological integrations in arbitrary cosmologies, both with or without periodic boundary conditions. Stretched periodic boxes, and special cases such as simulations with two periodic dimensions and one non-periodic dimension are supported as well. The modeling of hydrodynamics is optional. The code is adaptive both in space and in time, and its Lagrangian character makes it particularly suitable for simulations of cosmic structure formation. Several post-processing options such as group- and substructure finding, or power spectrum estimation are built in and can be carried out on the fly or applied to existing snapshots. Through a built-in cosmological initial conditions generator, it is also particularly easy to carry out cosmological simulations. In addition, merger trees can be determined directly by the code. - Web site: https://wwwmpa.mpa-garching.mpg.de/gadget4 - Code download: https://gitlab.mpcdf.mpg.de/vrs/gadget4 - Build and run instructions: https://wwwmpa.mpa-garching.mpg.de/gadget4/02_running.html - Benchmarks: - [Case A: Colliding galaxies with star formation](./gadget/4.0/gadget4-case-A.tar.gz) - [Case B: Cosmological DM-only simulation with IC creation](./gadget/4.0/gadget4-case-B.tar.gz) - [Case C: Adiabatic collapse of a gas sphere](./gadget/4.0/gadget4-case-C.tar.gz) - [Code used in the benchmarks](./gadget/4.0/gadget4.tar.gz) - [Build & run instructions, details about the benchmarks](./gadget/4.0/README.md) # PFARM PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the variational solution of the many-electron Schrödinger equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical applications (such as: the interstellar medium, planetary atmospheres) with, for example, various ions of Fe and Ni and neutral O, plus other applications such as data for plasma modelling and fusion reactor impurities. The code has recently been adapted to form a compatible interface with the UKRmol suite of codes for electron (positron) molecule collisions thus enabling large-scale parallel ‘outer-region’ calculations for molecular systems as well as atomic systems. The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels. Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time. Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/pfarm - Build & Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/pfarm/PFARM_Build_Run_README.txt - Test Case 1: https://repository.prace-ri.eu/ueabs/PFARM/2.2/test_case_1_atom - Test Case 2: https://repository.prace-ri.eu/ueabs/PFARM/2.2/test_case_2_mol # QCD | **General information** | **Scientific field** | **Language** | **MPI** | **OpenMP** | **GPU** | **LoC** | **Code description** | |------------------|----------------------|--------------|---------|------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------| |
[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_1)
[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_1/README.md) | lattice Quantum Chromodynamics Part 1 | C | yes | yes | yes (CUDA) | -- | Accelerator enabled kernel E of UEABS QCD CPU part using targetDP model. Test case A - 8x64x64x64. Conjugate Gradient solver involving Wilson Dirac stencil. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. | |
[- Source](https://lattice.github.io/quda/)
[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_2)
[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_2/README.md) | lattice Quantum Chromodynamics Part 2 - QUDA | C++ | yes | yes | yes (CUDA) | -- | Part 2: GPU is using a QUDA kernel for running on NVIDIA GPUs. [Test case A - 96x32x32x32] Small problem size. CG solver. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. [Test case B - 126x64x64x64] Moderate problem size. CG solver on Wilson Dirac stencil. Bandwidth bounded | |
[- Source](http://jeffersonlab.github.io/qphix/)
[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_2)
[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_2/README.md) | lattice Quantum Chromodynamics Part 2 - QPHIX | C++ | yes | yes | no | -- | Part 2: Xeon(Phi) is using a QPhiX kernel which is optimize to run on x86, in particular Intel Xeon (Phi). [Test case A - 96x32x32x32] Small problem size. CG solver involving Wilson Dirac stencil. Domain Decomposition, Memory bandwidth, strong scaling, MPI latency. [Test case B - 126x64x64x64] Moderate problem size. CG solver on Wilson Dirac stencil. Bandwidth bounded | |
[- Source](https://repository.prace-ri.eu/ueabs/QCD/1.3/QCD_Source_TestCaseA.tar.gz)
[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/qcd/part_cpu)
[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/qcd/part_cpu/README.md) | lattice Quantum Chromodynamics - CPU Part - legacy UEABS | C/Fortran | yes | yes/no | No | -- | CPU part based on UEABS QCD CPU part (legacy) benchmark kernels (last update 2017). Based on 5 different Benchmark applications representative for the European Lattice QCD community (see doc for more details). |