@@ -8,12 +8,12 @@ The UEABS has been and will be actively updated and maintained by the subsequent
Each application code has either one, or two input datasets. If there are two datasets, Test Case A is designed to run on Tier-1 sized systems (up to around 1,000 x86 cores, or equivalent) and Test Case B is designed to run on Tier-0 sized systems (up to around 10,000 x86 cores, or equivalent). If there is only one dataset (Test Case A), it is suitable for both sizes of system.
Contacts: Valeriu Codreanu <mailto:valeriu.codreanu@surfsara.nl> or Walter Lioen <mailto:walter.lioen@surfsara.nl>
Contacts: (Ok to mention all BCOs here?, ask PMO for a UEABS contact mailing list address?), Walter Lioen <mailto:walter.lioen@surf.nl>
Current Release
---------------
The current release is Version 2.1 (April 30, 2019).
The current release is Version 2.2 (December 31, 2021).
See also the [release notes and history](RELEASES.md).
Running the suite
...
...
@@ -21,10 +21,196 @@ Running the suite
Instructions to run each test cases of each codes can be found in the subdirectories of this repository.
For more details of the codes and datasets, and sample results, please see the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019) at http://www.prace-ri.eu/public-deliverables/ .
For more details of the codes and datasets, and sample results, please see the PRACE-6IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021) at http://www.prace-ri.eu/public-deliverables/ .
The application codes that constitute the UEABS are:
<li><ahref="https://gitlab.com/bsc-alya/benchmarks/sphere-16M">Test Case A</a></li>
<li><ahref="https://gitlab.com/bsc-alya/benchmarks/sphere-132M">Test Case B</a></li>
</ul>
</td>
<td>600,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain).</td>
</tr>
<tr>
<td>Code_Saturne</td>
<td>~350,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>The code solves the Navier-Stokes equations for imcompressible/compressible flows using a predictor-corrector technique. The Poisson pressure equation is solved by a Conjugate Gradient preconditioned by a multi-grid algorithm, and the transport equations by Conjugate Gradient-like methods. Advanced gradient reconstruction is also available to account for distorted meshes.</td>
</tr>
<tr>
<td>CP2K</td>
<td>~1,150,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>CP2K is a freely available quantum chemistry and solid-state physics software package for performing atomistic simulations. It can be run with MPI, OpenMP and CUDA. All of CP2K is MPI parallelised, with some routines making use of OpenMP, which can be used to reduce the memory footprint. In addition some linear algebra operations may be offloaded to GPUs using CUDA.</td>
</tr>
<tr>
<td>GADGET</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>GPAW</td>
<td>132,000</td>
<td>X</td>
<td></td>
<td>X</td>
<td></td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
</tr>
<tr>
<td>GROMACS</td>
<td>3,227,337</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>NAMD</td>
<td>1,992,651</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>NEMO</td>
<td>154,240</td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td>NEMO (Nucleus for European Modelling of the Ocean) is a mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by a European consortium. It is intended to be a tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).</td>
</tr>
<tr>
<td>PFARM</td>
<td>21,434</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>PFARM uses an R-matrix ab-initio approach to calculate electron-atom and electron-molecule collisions data for a wide range of applications including atrophysics and nuclear fusion. It is written in modern Fortran/MPI/OpenMP and exploits highly-optimised dense linear algebra numerical library routines.</td>
</tr>
<tr>
<td>QCD</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Quantum ESPRESSO</td>
<td>92,996</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SPECFEM3D</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>TensorFlow</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
-[ALYA](#alya)
-[Code_Saturne](#saturne)
...
...
@@ -46,11 +232,10 @@ The application codes that constitute the UEABS are:
The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP.
- Web site: https://www.bsc.es/computer-applications/alya-system
- Build and run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/alya/README.md
- Test Case A: https://gitlab.com/bsc-alya/benchmarks/sphere-16M
- Test Case B: https://gitlab.com/bsc-alya/benchmarks/sphere-132M
# Code_Saturne <a name="saturne"></a>
...
...
@@ -83,14 +268,20 @@ CP2K is written in Fortran 2008 and can be run in parallel using a combination o
# GADGET <a name="gadget"></a>
GADGET is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany. GADGET is written in C and uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs. GADGET computes gravitational forces with a hierarchical tree algorithm (optionally in combination with a particle-mesh scheme for long-range gravitational forces) and represents fluids by means of smoothed particle hydrodynamics (SPH). The code can be used for studies of isolated systems, or for simulations that include the cosmological expansion of space, either with, or without, periodic boundary conditions. In all these types of simulations, GADGET follows the evolution of a self-gravitating collisionless N-body system, and allows gas dynamics to be optionally included. Both the force computation and the time stepping of GADGET are fully adaptive, with a dynamic range that is, in principle, unlimited. GADGET can therefore be used to address a wide array of astrophysics interesting problems, ranging from colliding and merging galaxies, to the formation of large-scale structure in the Universe. With the inclusion of additional physical processes such as radiative cooling and heating, GADGET can also be used to study the dynamics of the gaseous intergalactic medium, or to address star formation and its regulation by feedback processes.
GADGET-4 (GAlaxies with Dark matter and Gas intEracT), an evolved and improved version of GADGET-3, is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written mainly by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany, nd benefiting from numerous contributions, including Ruediger Pakmor, Oliver Zier, and Martin Reinecke. GADGET-4 supports collisionless simulations and smoothed particle hydrodynamics on massively parallel computers. All communication between concurrent execution processes is done either explicitly by means of the message passing interface (MPI), or implicitly through shared-memory accesses on processes on multi-core nodes. The code is mostly written in ISO C++ (assuming the C++11 standard), and should run on all parallel platforms that support at least MPI-3. So far, the compatibility of the code with current Linux/UNIX-based platforms has been confirmed on a large number of systems.
The code can be used for plain Newtonian dynamics, or for cosmological integrations in arbitrary cosmologies, both with or without periodic boundary conditions. Stretched periodic boxes, and special cases such as simulations with two periodic dimensions and one non-periodic dimension are supported as well. The modeling of hydrodynamics is optional. The code is adaptive both in space and in time, and its Lagrangian character makes it particularly suitable for simulations of cosmic structure formation. Several post-processing options such as group- and substructure finding, or power spectrum estimation are built in and can be carried out on the fly or applied to existing snapshots. Through a built-in cosmological initial conditions generator, it is also particularly easy to carry out cosmological simulations. In addition, merger trees can be determined directly by the code.
- Web site: https://wwwmpa.mpa-garching.mpg.de/gadget4
- Test Case A: https://repository.prace-ri.eu/ueabs/GADGET/gadget3_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/gadget/gadget3_Run_README.txt
# GPAW <a name="gpaw"></a>
...
...
@@ -103,14 +294,15 @@ The equations of the (time-dependent) density functional theory within the PAW m
The program offers several parallelization levels. The most basic parallelization strategy is domain decomposition over the real-space grid. In magnetic systems it is possible to parallelize over spin, and in systems that have k-points (surfaces or bulk systems) parallelization over k-points is also possible. Furthermore, parallelization over electronic states is possible in DFT and in real-time TD-DFT calculations. GPAW is written in Python and C and parallelized with MPI.
[gpawREADME, section "Mechanics of runningthebenchmark"](gpaw/README.md#mechanics-of-running-the-benchmark)
# GROMACS <a name="gromacs"></a>
...
...
@@ -167,21 +359,21 @@ NAMD is written in C++ and parallelised using Charm++ parallel objects, which ar
# NEMO <a name="nemo"></a>
NEMO (Nucleus for European Modelling of the Ocean) [22] is mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by European consortium. It is intended to be tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).
Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity.
Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity.
In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid for most of the cases.
The model is implemented in Fortran 90, with preprocessing (C-pre-processor). It is optimized for vector computers and parallelized by domain decomposition with MPI. It supports modern C/C++ and Fortran compilers. All input and output is done with third party software called XIOS with dependency on NetCDF (Network Common Data Format) and HDF5. It is highly scalable and perfect application for measuring supercomputing performances in terms of compute capacity, memory subsystem, I/O and interconnect performance.
### Test Case Description
### Test Case Description
The GYRE configuration has been built to model seasonal cycle of double gyre box model. It consists of idealized domain over which seasonal forcing is applied. This allows for studying large number of interactions and their combined contribution to large scale circulation.
The domain geometry is rectangular bounded by vertical walls and flat bottom. The configuration is meant to represent idealized north Atlantic or north pacific basin. The circulation is forced by analytical profiles of wind and buoyancy fluxes.
The domain geometry is rectangular bounded by vertical walls and flat bottom. The configuration is meant to represent idealized north Atlantic or north pacific basin. The circulation is forced by analytical profiles of wind and buoyancy fluxes.
The wind stress is zonal and its curl changes sign at 22 and 36. It forces a subpolar gyre in the north, a subtropical gyre in the wider part of the domain and a small recirculation gyre in the southern corner. The net heat flux takes the form of a restoring toward a zonal apparent air temperature profile.
A portion of the net heat flux which comes from the solar radiation is allowed to penetrate within the water column. The fresh water flux is also prescribed and varies zonally. It is determined such as, at each time step, the basin-integrated flux is zero.
The basin is initialized at rest with vertical profiles of temperature and salinity uniformity applied to the whole domain. The GYRE configuration is set through the namelist_cfg file.
The basin is initialized at rest with vertical profiles of temperature and salinity uniformity applied to the whole domain. The GYRE configuration is set through the namelist_cfg file.
The horizontal resolution is determined by setting jp_cfg as follows:
...
...
@@ -203,7 +395,7 @@ In this configuration, we use default value of 30 ocean levels depicted by jpk=3
**Test Case B**
* jp_cfg = 256 suitable up to 20,000 cores.
* Number of Days (real): 80
* Number of Days (real): 80
* Number of time step: 4320
* Time step size(real): 20 mins
* Number of seconds per time step: 1200
...
...
@@ -214,19 +406,19 @@ In this configuration, we use default value of 30 ocean levels depicted by jpk=3
# PFARM <a name="pfarm"></a>
PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the variational solution of the many-electron Schrödinger
equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical
applications (such as: the interstellar medium, planetary atmospheres) with, for example, various ions of Fe and Ni and neutral O, plus
other applications such as data for plasma modelling and fusion reactor impurities. The code has recently been adapted to form a compatible
interface with the UKRmol suite of codes for electron (positron) molecule collisions thus enabling large-scale parallel ‘outer-region’
calculations for molecular systems as well as atomic systems.
PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the variational solution of the many-electron Schrödinger
equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical
applications (such as: the interstellar medium, planetary atmospheres) with, for example, various ions of Fe and Ni and neutral O, plus
other applications such as data for plasma modelling and fusion reactor impurities. The code has recently been adapted to form a compatible
interface with the UKRmol suite of codes for electron (positron) molecule collisions thus enabling large-scale parallel ‘outer-region’
calculations for molecular systems as well as atomic systems.
The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions.
The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take
advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via
shared memory enabled numerical library kernels.
The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions.
The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take
advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via
shared memory enabled numerical library kernels.
Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time.
Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time.
- Build & Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/pfarm/PFARM_Build_Run_README.txt
...
...
@@ -262,7 +454,7 @@ QUANTUM ESPRESSO is written mostly in Fortran90, and parallelised using MPI and
The Scalable HeterOgeneous Computing (SHOC) benchmark suite is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures
for general purpose computing. It serves as synthetic benchmark suite in the UEABS context. Its initial focus is on systems containing Graphics Processing Units (GPUs) and multi-core processors, featuring implementations using both CUDA and OpenCL. It can be used on clusters as well as individual hosts.
Also, SHOC includes an Offload branch for the benchmarks that can be used to evaluate the Intel Xeon Phi x100 family.
Also, SHOC includes an Offload branch for the benchmarks that can be used to evaluate the Intel Xeon Phi x100 family.
The SHOC benchmark suite currently contains benchmark programs, categoried based on complexity. Some measure low-level "feeds and speeds" behavior (Level 0), some measure the performance of a higher-level operation such as a Fast Fourier Transform (FFT) (Level 1), and the others measure real application kernels (Level 2).
...
...
@@ -275,16 +467,16 @@ The SHOC benchmark suite currently contains benchmark programs, categoried based
| [- Website](https://geodynamics.org/cig/software/specfem3d_globe/)<br>[- Source](https://github.com/geodynamics/specfem3d_globe.git)<br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/tree/r2.1-dev/specfem3d)<br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/specfem3d/PRACE_UEABS_Specfem3D_summary.pdf) | Geodynamics | Fortran | yes | yes | Yes (CUDA) | 140000 | The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). |
| [- Website](https://geodynamics.org/cig/software/specfem3d_globe/)<br>[- Source](https://github.com/geodynamics/specfem3d_globe.git)<br>[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/tree/r2.1-dev/specfem3d)<br>[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/specfem3d/PRACE_UEABS_Specfem3D_summary.pdf) | Geodynamics | Fortran & C | yes | yes | Yes (CUDA) | 100k Fortran & 20k C | The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). |
# TensorFlow <a name="tensorflow"></a>
TensorFlow (https://www.tensorflow.org) is a popular open-source library for symbolic math and linear algebra, with particular optimization for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry.
TensorFlow (https://www.tensorflow.org) is a popular open-source library for symbolic math and linear algebra, with particular optimization for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry.
TensorFlow supports a wide variety of hardware platforms (CPUs, GPUs, TPUs), and can be scaled up to utilize multiple compute devices on a single or multiple compute nodes. The main objective of this benchmark is to profile the scaling behavior of TensorFlow on different hardware, and thereby provide a reference baseline of its performance for different sizes of applications.
There are many open-source datasets available for benchmarking TensorFlow, such as `mnist`, `fashion_mnist`, `cifar`, `imagenet`, and so on. This benchmark suite, however, would like to focus on a scientific research use case. `DeepGalaxy` is a code built with TensorFlow, which uses deep neural network to classify galaxy mergers in the Universe, observed by the Hubble Space Telescope and the Sloan Digital Sky Survey.
There are many open-source datasets available for benchmarking TensorFlow, such as `mnist`, `fashion_mnist`, `cifar`, `imagenet`, and so on. This benchmark suite, however, would like to focus on a scientific research use case. `DeepGalaxy` is a code built with TensorFlow, which uses deep neural network to classify galaxy mergers in the Universe, observed by the Hubble Space Telescope and the Sloan Digital Sky Survey.
* Changed the presentation, making it similar to the CORAL Benchmarks (cf. <ahref="https://asc.llnl.gov/coral-benchmarks">CORAL Benchmarks</a> and <ahref="https://asc.llnl.gov/coral-2-benchmarks">CORAL-2 Benchmarks</a>)
* Removed the SHOC benchmark suite
* Added the TensorFlow benchmark
* Alya ...
* ...
* ...
* TensorFlow ...
* Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021)
## Version 2.1 (PRACE-5IP, April 30, 2019)
* Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019)
* Test Case A: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseA.tar.gz
* Test Case A: https://gitlab.com/bsc-alya/benchmarks/sphere-16M
* Test Case B: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseB.tar.gz
* Test Case B: https://gitlab.com/bsc-alya/benchmarks/sphere-132M
## Mechanics of Building Benchmark
Alya builds the makefile from the compilation options defined in config.in. In order to build ALYA (Alya.x), please follow these steps after unpack the tar.gz:
You can compile alya using CMake. It follows the classic CMake configuration, except for the compiler management that has been customized by the developers.
### Creation of the build directory
In your alya directory, create a new build directory:
Go to to directory: Executables/unix
```
cd Executables/unix
mkdir build
cd build
```
Edit config.in (some default config.in files can be found in directory configure.in):
### Configuration
* Select your own MPI wrappers and paths
* Select size of integers. Default is 4 bytes, For 8 bytes, select -DI8
* Choose your metis version, metis-4.0 or metis-5.1.0_i8 for 8-bytes integers
To configure cmake using the command line, type the following:
Configure Alya:
cmake ..
./configure -x nastin parall
If you want to customize the build options, use -DOPTION=value. For example, to enable GPU as it follows:
Compile metis:
cmake .. -DWITH_GPU=ON
make metis4
### Compilation
or
make metis5
Finally, compile Alya:
make -j 8
make -j 8
For more information: https://gitlab.com/bsc-alya/alya/-/wikis/Documentation/Installation
## Mechanics of Running Benchmark
...
...
@@ -59,8 +57,8 @@ The parameters used in the datasets try to represent at best typical industrial