Skip to content
Commits on Source (349)
# Unified European Applications Benchmark Suite
<img alt="PRACE" src="PRACE_logo.png" width="150px">
The Unified European Application Benchmark Suite (UEABS) is a set of currently 13 application codes taken from the pre-existing PRACE and DEISA application benchmark suites, and extended with the PRACE Accelerator Benchmark Suite. The objective is providing a single benchmark suite of scalable, currently relevant and publicly available application codes and datasets, of a size which can realistically be run on large systems, and maintained into the future.
The UEABS activity was started during the PRACE-PP project and was publicly released by the PRACE-2IP project.
......@@ -8,12 +10,12 @@ The UEABS has been and will be actively updated and maintained by the subsequent
Each application code has either one, or two input datasets. If there are two datasets, Test Case A is designed to run on Tier-1 sized systems (up to around 1,000 x86 cores, or equivalent) and Test Case B is designed to run on Tier-0 sized systems (up to around 10,000 x86 cores, or equivalent). If there is only one dataset (Test Case A), it is suitable for both sizes of system.
Contacts: Valeriu Codreanu <mailto:valeriu.codreanu@surfsara.nl> or Walter Lioen <mailto:walter.lioen@surfsara.nl>
Contact: Walter Lioen <mailto:walter.lioen@surf.nl>
Current Release
---------------
The current release is Version 2.1 (April 30, 2019).
The current release is Version 2.2 (December 31, 2021).
See also the [release notes and history](RELEASES.md).
Running the suite
......@@ -21,325 +23,323 @@ Running the suite
Instructions to run each test cases of each codes can be found in the subdirectories of this repository.
For more details of the codes and datasets, and sample results, please see the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019) at http://www.prace-ri.eu/public-deliverables/ .
For more details of the codes and datasets, and sample results, please see the PRACE-6IP benchmarking deliverable [D7.5 "Evaluation of Benchmark Performance"](https://prace-ri.eu/wp-content/uploads/PRACE6IP-D7.4.pdf) (November 30, 2021).
The application codes that constitute the UEABS are:
---------------------------------------------------
- [ALYA](#alya)
- [Code_Saturne](#saturne)
- [CP2K](#cp2k)
- [GADGET](#gadget)
- [GPAW](#gpaw)
- [GROMACS](#gromacs)
- [NAMD](#namd)
- [NEMO](#nemo)
- [PFARM](#pfarm)
- [QCD](#qcd)
- [Quantum Espresso](#espresso)
- [SHOC](#shoc)
- [SPECFEM3D](#specfem3d)
- [TensorFlow](#tensorflow)
# ALYA <a name="alya"></a>
The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP.
- Web site: https://www.bsc.es/computer-applications/alya-system
- Code download: https://repository.prace-ri.eu/ueabs/ALYA/2.1/Alya.tar.gz
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1/alya/ALYA_Build_README.txt
- Test Case A: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseA.tar.gz
- Test Case B: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1/alya/ALYA_Run_README.txt
# Code_Saturne <a name="saturne"></a>
Code_Saturne is open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to account for other physics, such as conjugate heat transfer and structure mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi.
The original version of the code is written in C for pre-postprocessing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for most of the physics implementation. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to handle potential shared memory. The version used in this work (also freely available) relies also on CUDA to take advantage of potential GPU acceleration.
The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is usually due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the two test cases chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, with a mesh 8 times larger for Test Case B than for Test Case A, the time step being halved to ensure a correct Courant number.
- Web site: https://code-saturne.org
- Code download: https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build and Run instructions: [code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf](code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf)
- Test Case A: https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz
- Test Case B: https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz
# CP2K <a name="cp2k"></a>
CP2K is a freely available quantum chemistry and solid-state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modelling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, ...), and classical force fields (AMBER, CHARMM, ...). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.
CP2K is written in Fortran 2008 and can be run in parallel using a combination of multi-threading, MPI, and CUDA. All of CP2K is MPI parallelised, with some additional loops also being OpenMP parallelised. It is therefore most important to take advantage of MPI parallelisation, however running one MPI rank per CPU core often leads to memory shortage. At this point OpenMP threads can be used to utilise all CPU cores without suffering an overly large memory footprint. The optimal ratio between MPI ranks and OpenMP threads depends on the type of simulation and the system in question. CP2K supports CUDA, allowing it to offload some linear algebra operations including sparse matrix multiplications to the GPU through its DBCSR acceleration layer. FFTs can optionally also be offloaded to the GPU. Benefits of GPU offloading may yield improved performance depending on the type of simulation and the system in question.
- Web site: https://www.cp2k.org/
- Code download: https://github.com/cp2k/cp2k/releases
- [Build & run instructions, details about benchmarks](./cp2k/README.md)
- Benchmarks:
- [Test Case A](./cp2k/benchmarks/TestCaseA_H2O-512)
- [Test Case B](./cp2k/benchmarks/TestCaseB_LiH-HFX)
- [Test Case C](./cp2k/benchmarks/TestCaseC_H2O-DFT-LS)
# GADGET <a name="gadget"></a>
GADGET is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany. GADGET is written in C and uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs. GADGET computes gravitational forces with a hierarchical tree algorithm (optionally in combination with a particle-mesh scheme for long-range gravitational forces) and represents fluids by means of smoothed particle hydrodynamics (SPH). The code can be used for studies of isolated systems, or for simulations that include the cosmological expansion of space, either with, or without, periodic boundary conditions. In all these types of simulations, GADGET follows the evolution of a self-gravitating collisionless N-body system, and allows gas dynamics to be optionally included. Both the force computation and the time stepping of GADGET are fully adaptive, with a dynamic range that is, in principle, unlimited. GADGET can therefore be used to address a wide array of astrophysics interesting problems, ranging from colliding and merging galaxies, to the formation of large-scale structure in the Universe. With the inclusion of additional physical processes such as radiative cooling and heating, GADGET can also be used to study the dynamics of the gaseous intergalactic medium, or to address star formation and its regulation by feedback processes.
- Web site: http://www.mpa-garching.mpg.de/gadget/
- Code download: https://repository.prace-ri.eu/ueabs/GADGET/gadget3_Source.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/gadget/gadget3_Build_README.txt
- Test Case A: https://repository.prace-ri.eu/ueabs/GADGET/gadget3_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/gadget/gadget3_Run_README.txt
# GPAW <a name="gpaw"></a>
GPAW is an efficient program package for electronic structure calculations based on the density functional theory (DFT) and the time-dependent density functional theory (TD-DFT). The density-functional theory allows studies of ground state properties such as energetics and equilibrium geometries, while the time-dependent density functional theory can be used for calculating excited state properties such as optical spectra. The program package includes two complementary implementations of time-dependent density functional theory: a linear response formalism and a time-propagation in real time.
The program uses the projector augmented wave (PAW) method that allows one to get rid of the core electrons and work with soft pseudo valence wave functions. The PAW method can be applied on the same footing to all elements, for example, it provides a reliable description of the transition metal elements and the first row elements with open p-shells that are often problematic for standard pseudopotentials. A further advantage of the PAW method is that it is an all-electron method (frozen core approximation) and there is a one to one transformation between the pseudo and all-electron quantities.
The equations of the (time-dependent) density functional theory within the PAW method are discretized using finite-differences and uniform real-space grids. The real-space representation allows flexible boundary conditions, as the system can be finite or periodic in one, two or three dimensions (e.g. cluster, slab, bulk). The accuracy of the discretization is controlled basically by single parameter, the grid spacing. The real-space representation allows also efficient parallelization with domain decomposition.
The program offers several parallelization levels. The most basic parallelization strategy is domain decomposition over the real-space grid. In magnetic systems it is possible to parallelize over spin, and in systems that have k-points (surfaces or bulk systems) parallelization over k-points is also possible. Furthermore, parallelization over electronic states is possible in DFT and in real-time TD-DFT calculations. GPAW is written in Python and C and parallelized with MPI.
- Web site: https://wiki.fysik.dtu.dk/gpaw/
- Code download: https://gitlab.com/gpaw/gpaw
- Build instructions: [gpaw/README.md#install](gpaw/README.md#install)
- Benchmarks:
- [Case S: Carbon nanotube](gpaw/benchmark/carbon-nanotube)
- [Case M: Copper filament](gpaw/benchmark/copper-filament)
- [Case L: Silicon cluster](gpaw/benchmark/silicon-cluster)
- Run instructions:
[gpaw/README.md#running-the-benchmarks](gpaw/README.md#running-the-benchmarks)
# GROMACS <a name="gromacs"></a>
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
GROMACS supports all the usual algorithms you expect from a modern molecular dynamics implementation, (check the online reference or manual for details), but there are also quite a few features that make it stand out from the competition:
- GROMACS provides extremely high performance compared to all other programs. A lot of algorithmic optimizations have been introduced in the code; we have for instance extracted the calculation of the virial from the innermost loops over pairwise interactions, and we use our own software routines to calculate the inverse square root. In GROMACS 4.6 and up, on almost all common computing platforms, the innermost loops are written in C using intrinsic functions that the compiler transforms to SIMD machine instructions, to utilize the available instruction-level parallelism. These kernels are available in either single and double precision, and in support all the different kinds of SIMD support found in x86-family (and other) processors.
- Also since GROMACS 4.6, we have excellent CUDA-based GPU acceleration on GPUs that have Nvidia compute capability >= 2.0 (e.g. Fermi or later)
- GROMACS is user-friendly, with topologies and parameter files written in clear text format. There is a lot of consistency checking, and clear error messages are issued when something is wrong. Since a C preprocessor is used, you can have conditional parts in your topologies and include other files. You can even compress most files and GROMACS will automatically pipe them through gzip upon reading.
- There is no scripting language – all programs use a simple interface with command line options for input and output files. You can always get help on the options by using the -h option, or use the extensive manuals provided free of charge in electronic or paper format.
- As the simulation is proceeding, GROMACS will continuously tell you how far it has come, and what time and date it expects to be finished.
- Both run input files and trajectories are independent of hardware endian-ness, and can thus be read by any version GROMACS, even if it was compiled using a different floating-point precision.
- GROMACS can write coordinates using lossy compression, which provides a very compact way of storing trajectory data. The accuracy can be selected by the user.
- GROMACS comes with a large selection of flexible tools for trajectory analysis – you won’t have to write any code to perform routine analyses. The output is further provided in the form of finished Xmgr/Grace graphs, with axis labels, legends, etc. already in place!
- A basic trajectory viewer that only requires standard X libraries is included, and several external visualization tools can read the GROMACS file formats.
- GROMACS can be run in parallel, using either the standard MPI communication protocol, or via our own “Thread MPI” library for single-node workstations.
- GROMACS contains several state-of-the-art algorithms that make it possible to extend the time steps is simulations significantly, and thereby further enhance performance without sacrificing accuracy or detail.
- The package includes a fully automated topology builder for proteins, even multimeric structures. Building blocks are available for the 20 standard aminoacid residues as well as some modified ones, the 4 nucleotide and 4 deoxinucleotide resides, several sugars and lipids, and some special groups like hemes and several small molecules.
- There is ongoing development to extend GROMACS with interfaces both to Quantum Chemistry and Bioinformatics/databases.
- GROMACS is Free Software, available under the GNU Lesser General Public License (LGPL), version 2.1. You can redistribute it and/or modify it under the terms of the LGPL as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
Instructions:
- Web site: http://www.gromacs.org/
- Code download: http://www.gromacs.org/Downloads The UEABS benchmark cases require the use of 5.1.x or newer branch: the latest 2016 version is suggested.
- Test Case A: https://repository.prace-ri.eu/ueabs/GROMACS/1.2/GROMACS_TestCaseA.tar.gz
- Test Case B: https://repository.prace-ri.eu/ueabs/GROMACS/1.2/GROMACS_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/gromacs/GROMACS_Run_README.txt
# NAMD <a name="namd"></a>
NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of compute platforms. NAMD is developed by the “Theoretical and Computational Biophysics Group” at the University of Illinois at Urbana Champaign. In the design of NAMD particular emphasis has been placed on scalability when utilizing a large number of processors. The application can read a wide variety of different file formats, for example force fields, protein structure, which are commonly used in bio-molecular science.
A NAMD license can be applied for on the developer’s website free of charge. Once the license has been obtained, binaries for a number of platforms and the source can be downloaded from the website.
Deployment areas of NAMD include pharmaceutical research by academic and industrial users. NAMD is particularly suitable when the interaction between a number of proteins or between proteins and other chemical substances is of interest. Typical examples are vaccine research and transport processes through cell membrane proteins.
NAMD is written in C++ and parallelised using Charm++ parallel objects, which are implemented on top of MPI.
- Web site: http://www.ks.uiuc.edu/Research/namd/
- Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/namd/NAMD_Download_README.txt
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/namd/NAMD_Build_README.txt
- Test Case A: https://repository.prace-ri.eu/ueabs/NAMD/1.2/NAMD_TestCaseA.tar.gz
- Test Case B: https://repository.prace-ri.eu/ueabs/NAMD/1.2/NAMD_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/namd/NAMD_Run_README.txt
# NEMO <a name="nemo"></a>
NEMO (Nucleus for European Modelling of the Ocean) [22] is mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by European consortium. It is intended to be tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).
Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity.
In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid for most of the cases.
The model is implemented in Fortran 90, with preprocessing (C-pre-processor). It is optimized for vector computers and parallelized by domain decomposition with MPI. It supports modern C/C++ and Fortran compilers. All input and output is done with third party software called XIOS with dependency on NetCDF (Network Common Data Format) and HDF5. It is highly scalable and perfect application for measuring supercomputing performances in terms of compute capacity, memory subsystem, I/O and interconnect performance.
### Test Case Description
The GYRE configuration has been built to model seasonal cycle of double gyre box model. It consists of idealized domain over which seasonal forcing is applied. This allows for studying large number of interactions and their combined contribution to large scale circulation.
The domain geometry is rectangular bounded by vertical walls and flat bottom. The configuration is meant to represent idealized north Atlantic or north pacific basin. The circulation is forced by analytical profiles of wind and buoyancy fluxes.
The wind stress is zonal and its curl changes sign at 22 and 36. It forces a subpolar gyre in the north, a subtropical gyre in the wider part of the domain and a small recirculation gyre in the southern corner. The net heat flux takes the form of a restoring toward a zonal apparent air temperature profile.
A portion of the net heat flux which comes from the solar radiation is allowed to penetrate within the water column. The fresh water flux is also prescribed and varies zonally. It is determined such as, at each time step, the basin-integrated flux is zero.
The basin is initialized at rest with vertical profiles of temperature and salinity uniformity applied to the whole domain. The GYRE configuration is set through the namelist_cfg file.
The horizontal resolution is determined by setting jp_cfg as follows:
`Jpiglo = 30 x jp_cfg + 2`
`Jpjglo = 20 x jp_cfg + 2`
In this configuration, we use default value of 30 ocean levels depicted by jpk=31. The GYRE configuration is an ideal case for benchmark test as it is very simple to increase the resolution and perform both weak and strong scalability experiment using the same input files. We use two configurations as follows:
**Test Case A**:
* jp_cfg = 128 suitable up to 1000 cores
* Number of Days: 20
* Number of Time steps: 1440
* Time step size: 20 mins
* Number of seconds per time step: 1200
**Test Case B**
* jp_cfg = 256 suitable up to 20,000 cores.
* Number of Days (real): 80
* Number of time step: 4320
* Time step size(real): 20 mins
* Number of seconds per time step: 1200
* Web site: <http://www.nemo-ocean.eu/>
* Download, Build and Run Instructions : <https://repository.prace-ri.eu/git/UEABS/ueabs/tree/master/nemo>
# PFARM <a name="pfarm"></a>
PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the variational solution of the many-electron Schrödinger
equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical
applications (such as: the interstellar medium, planetary atmospheres) with, for example, various ions of Fe and Ni and neutral O, plus
other applications such as data for plasma modelling and fusion reactor impurities. The code has recently been adapted to form a compatible
interface with the UKRmol suite of codes for electron (positron) molecule collisions thus enabling large-scale parallel ‘outer-region’
calculations for molecular systems as well as atomic systems.
The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions.
The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take
advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via
shared memory enabled numerical library kernels.
Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time.
- CPU Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/pfarm/RMX_MAGMA_CPU_mol.tar.gz
- GPU Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/pfarm/RMX_MAGMA_GPU_mol.tar.gz
- Build & Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/pfarm/PFARM_Build_Run_README.txt
- Test Case A: https://repository.prace-ri.eu/UEABS/ueabs/blob/r2.1-dev/pfarm/PFARM_TestCaseA.tar.bz2
- Test Case B: https://repository.prace-ri.eu/UEABS/ueabs/blob/r2.1-dev/pfarm/PFARM_TestCaseB.tar.bz2
# QCD <a name="qcd"></a>
The QCD benchmark is, unlike the other benchmarks in the PRACE application benchmark suite,
not a full application but a set of 3 parts which are representative of some of the most compute-intensive parts of QCD calculations.
Part 1:
The QCD Accelerator Benchmark suite Part 1 is a direct port of "QCD kernel E" from the
CPU part, which is based on the MILC code suite
(http://www.physics.utah.edu/~detar/milc/). The performance-portable
targetDP model has been used to allow the benchmark to utilise NVIDIA
GPUs, Intel Xeon Phi manycore CPUs and traditional multi-core
CPUs. The use of MPI (in conjunction with targetDP) allows multiple
nodes to be used in parallel.
Part 2:
The QCD Accelerator Benchmark suite Part 2 consists of two kernels,
the QUDA and the QPhix library. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/).
The QPhix library consists of routines which are optimize to use INTEL intrinsic functions of multiple vector length, including optimized routines
for KNC and KNL (http://jeffersonlab.github.io/qphix/).
The benchmark kernels are using the provided Conjugated Gradient benchmark functions of the libraries.
Part CPU:
The CPU part of QCD benchmark is not a full application but a set of 5 kernels which are
representative of some of the most compute-intensive parts of QCD calculations.
Each of the 5 kernels has one test case:
Kernel A is derived from BQCD (Berlin Quantum ChromoDynamics program), a hybrid Monte-Carlo code which simulates Quantum Chromodynamics with dynamical standard Wilson fermions. The computations take place on a four-dimensional regular grid with periodic boundary conditions. The kernel is a standard conjugate gradient solver with even/odd pre-conditioning. The default lattice size is 16x16x16x16 for the small test case and 32x32x64x64 for the medium test case.
Kernel C is derived from SU3_AHiggs, a lattice quantum chromodynamics (QCD) code intended for computing the conditions of the Early Universe. Instead of “full QCD”,
the code applies an effective field theory,which is valid at high temperatures. In the effective theory, the lattice is 3D. The default lattice size is 64x64x64 for the small test case
and 256x256x256 for the medium test case. Lattice size is 8x8x8x8. Note that Kernel C can only be run in a weak scaling mode, where each CPU stores the same local lattice size,
regardless of the number of CPUs. Ideal scaling for this kernel therefore corresponds to constant execution time, and performance is simply the reciprocal of the execution time.
Kernel C is based on the software package openQCD. Kernel C is build for run in a weak scaling mode, where each CPU stores the same local lattice size, regardless of the number of CPUs. Ideal scaling for this kernel therefore corresponds to constant execution time, and performance is simply the reciprocal of the execution time. The local lattice size is 8x8x8x8.
Kernel D consists of the core matrix-vector multiplication routine for standard Wilson fermions based on the software package tmLQCD.
The default lattice size is 16x16x16x16 for the small test case and 64x64x64x64 for the medium test case.
Kernel E consists of a full conjugate gradient solution using
Wilson fermions. The default lattice size is 16x16x16x16 for the
small test case and 64x64x64x32 for the medium test case.
- Code download: https://repository.prace-ri.eu/ueabs/QCD/1.3/QCD_Source_TestCaseA.tar.gz
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/qcd/QCD_Build_README.txt
- Test Case A: included with source download
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/qcd/QCD_Run_README.txt
# Quantum Espresso <a name="espresso"></a>
QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). QUANTUM ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and inter-operable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.
QUANTUM ESPRESSO is written mostly in Fortran90, and parallelised using MPI and OpenMP.
- Web site: http://www.quantum-espresso.org/
- Code download: http://www.quantum-espresso.org/download/
- Build instructions: http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/
- Test Case A: https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz
- Test Case B: https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/quantum_espresso/QE-guide.txt
# SHOC <a name="shoc"></a>
The Scalable HeterOgeneous Computing (SHOC) benchmark suite is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures
for general purpose computing. It serves as synthetic benchmark suite in the UEABS context. Its initial focus is on systems containing Graphics Processing Units (GPUs) and multi-core processors, featuring implementations using both CUDA and OpenCL. It can be used on clusters as well as individual hosts.
Also, SHOC includes an Offload branch for the benchmarks that can be used to evaluate the Intel Xeon Phi x100 family.
The SHOC benchmark suite currently contains benchmark programs, categoried based on complexity. Some measure low-level "feeds and speeds" behavior (Level 0), some measure the performance of a higher-level operation such as a Fast Fourier Transform (FFT) (Level 1), and the others measure real application kernels (Level 2).
- Web site: https://github.com/vetter/shoc
- Code download: https://github.com/vetter/shoc/archive/master.zip
- Build instructions: https://repository.prace-ri.eu/git/ueabs/ueabs/blob/r2.1-dev/shoc/README_ACC.md
- Run instructions: https://repository.prace-ri.eu/git/ueabs/ueabs/blob/r2.1-dev/shoc/README_ACC.md
# SPECFEM3D <a name="specfem3d"></a>
The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). All SPECFEM3D_GLOBE software is written in Fortran90 with full portability in mind, and conforms strictly to the Fortran95 standard. It uses no obsolete or obsolescent features of Fortran77. The package uses parallel programming based upon the Message Passing Interface (MPI).
The SEM was originally developed in computational fluid dynamics and has been successfully adapted to address problems in seismic wave propagation. It is a continuous Galerkin technique, which can easily be made discontinuous; it is then close to a particular case of the discontinuous Galerkin technique, with optimized efficiency because of its tensorized basis functions. In particular, it can accurately handle very distorted mesh elements. It has very good accuracy and convergence properties. The spectral element approach admits spectral rates of convergence and allows exploiting hp-convergence schemes. It is also very well suited to parallel implementation on very large supercomputers as well as on clusters of GPU accelerating graphics cards. Tensor products inside each element can be optimized to reach very high efficiency, and mesh point and element numbering can be optimized to reduce processor cache misses and improve cache reuse. The SEM can also handle triangular (in 2D) or tetrahedral (3D) elements as well as mixed meshes, although with increased cost and reduced accuracy in these elements, as in the discontinuous Galerkin method.
In many geological models in the context of seismic wave propagation studies (except for instance for fault dynamic rupture studies, in which very high frequencies of supershear rupture need to be modeled near the fault, a continuous formulation is sufficient because material property contrasts are not drastic and thus conforming mesh doubling bricks can efficiently handle mesh size variations. This is particularly true at the scale of the full Earth. Effects due to lateral variations in compressional-wave speed, shear-wave speed, density, a 3D crustal model, ellipticity, topography and bathyletry, the oceans, rotation, and self-gravitation are included. The package can accommodate full 21-parameter anisotropy as well as lateral variations in attenuation. Adjoint capabilities and finite-frequency kernel simulations are also included.
- Web site: http://geodynamics.org/cig/software/specfem3d_globe/
- Code download: http://geodynamics.org/cig/software/specfem3d_globe/
- Build instructions: http://www.geodynamics.org/wsvn/cig/seismo/3D/SPECFEM3D_GLOBE/trunk/doc/USER_MANUAL/manual_SPECFEM3D_GLOBE.pdf?op=file&rev=0&sc=0
- Test Case A: https://repository.prace-ri.eu/git/UEABS/ueabs/tree/r2.1-dev/specfem3d/test_cases/SPECFEM3D_TestCaseA
- Test Case B: https://repository.prace-ri.eu/git/UEABS/ueabs/tree/r2.1-dev/specfem3d/test_cases/SPECFEM3D_TestCaseB
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/specfem3d/README.md
# TensorFlow <a name="tensorflow"></a>
TensorFlow (https://www.tensorflow.org) is a popular open-source library for symbolic math and linear algebra, with particular optimization for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry.
TensorFlow supports a wide variety of hardware platforms (CPUs, GPUs, TPUs), and can be scaled up to utilize multiple compute devices on a single or multiple compute nodes. The main objective of this benchmark is to profile the scaling behavior of TensorFlow on different hardware, and thereby provide a reference baseline of its performance for different sizes of applications.
There are many open-source datasets available for benchmarking TensorFlow, such as `mnist`, `fashion_mnist`, `cifar`, `imagenet`, and so on. This benchmark suite, however, would like to focus on a scientific research use case. `DeepGalaxy` is a code built with TensorFlow, which uses deep neural network to classify galaxy mergers in the Universe, observed by the Hubble Space Telescope and the Sloan Digital Sky Survey.
- Website: https://github.com/maxwelltsai/DeepGalaxy
- Code download: https://github.com/maxwelltsai/DeepGalaxy
- [Prerequisites installation](tensorflow/prerequisites-installation.md)
- [Test Case A](tensorflow/Testcase_A/)
- [Test Case B](tensorflow/Testcase_B/)
- [Test Case C](tensorflow/Testcase_C/)
<table>
<thead>
<tr>
<th rowspan="2">Application</th>
<th rowspan="2">Lines of<br/>Code</th>
<th colspan="3">Parallelism</th>
<th colspan="4">Language</th>
<th rowspan="2">Code Description/Notes</th>
</tr>
<tr>
<th>MPI</th>
<th>OpenMP/<br/>Pthreads</th>
<th>GPU</th>
<th>Fortran</th>
<th>Python</th>
<th>C</th>
<th>C++</th>
</tr>
</thead>
<tbody>
<tr>
<td>Alya
<ul>
<li><a href="https://www.bsc.es/computer-applications/alya-system">website</a></li>
<li><a href="https://gitlab.com/bsc-alya/open-alya">source</a></li>
<li><a href="alya/README.md">instructions</a></li>
<li><a href="https://gitlab.com/bsc-alya/benchmarks/sphere-16M">Test Case A</a></li>
<li><a href="https://gitlab.com/bsc-alya/benchmarks/sphere-132M">Test Case B</a></li>
</ul>
</td>
<td>600,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain).</td>
</tr>
<tr>
<td>Code_Saturne
<ul>
<li><a href="https://www.code-saturne.org/cms/web">Code_Saturne website</a></li>
<li><a href="https://www.code-saturne.org/cms/sites/default/files/releases/code_saturne-7.0.0.tar.gz">Source code</a></li>
<li><a href="https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/README.md">Build instuctions</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz">Testcase A</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz">Testcase B</a></li>
</ul>
</td>
<td>~350,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>The code solves the Navier-Stokes equations for incompressible/compressible flows using a predictor-corrector technique. The Poisson pressure equation is solved by a Conjugate Gradient preconditioned by a multi-grid algorithm, and the transport equations by Conjugate Gradient-like methods. Advanced gradient reconstruction is also available to account for distorted meshes.</td>
</tr>
<tr>
<td>CP2K
<ul>
<li><a href="https://www.cp2k.org/">CP2K website</a></li>
<li><a href="https://github.com/cp2k/cp2k/releases">Source code</a></li>
<li><a href="./cp2k/README.md">Build instructions</a></li>
<li><a href="./cp2k/benchmarks/TestCaseA_H2O-512">Testcase A</a></li>
<li><a href="./cp2k/benchmarks/TestCaseB_LiH-HFX">Testcase B</a></li>
<li><a href="./cp2k/benchmarks/TestCaseC_H2O-DFT-LS">Testcase C</a></li>
</ul>
</td>
<td>~1,150,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>CP2K is a freely available quantum chemistry and solid-state physics software package for performing atomistic simulations. It can be run with MPI, OpenMP and CUDA. All of CP2K is MPI parallelised, with some routines making use of OpenMP, which can be used to reduce the memory footprint. In addition some linear algebra operations may be offloaded to GPUs using CUDA.</td>
</tr>
<tr>
<td>GADGET
<ul>
<li><a href="https://wwwmpa.mpa-garching.mpg.de/gadget4">GADET Website</a></li>
<li><a href="https://gitlab.mpcdf.mpg.de/vrs/gadget4">GADET GitLab</a></li>
<li><a href="gadget/README.md#mechanics-of-building-benchmark">Build instructions</a>
<li><a href="gadget/README.md#mechanics-of-running-benchmark">Run instructions</a>
<li><a href="./gadget/gadget4-case-A.tar.gz">Test Case A</a></li>
<li><a href="./gadget/gadget4-case-B.tar.gz">Test Case B</a></li>
</ul>
</td>
<td>100,000+</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
<td>GADGET-4 (GAlaxies with Dark matter and Gas intEracT), an evolved and improved version of GADGET-3, is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory. GADGET-4 supports collisionless simulations and smoothed particle hydrodynamics on massively parallel computers. All communication between concurrent execution processes is done either explicitly by means of the message passing interface (MPI), or implicitly through shared-memory accesses on processes on multi-core nodes. The code is mostly written in ISO C++ (assuming the C++11 standard), and should run on all parallel platforms that support at least MPI-3.</td>
</tr>
<tr>
<td>GPAW
<ul>
<li><a href="https://wiki.fysik.dtu.dk/gpaw/">website</a></li>
<li><a href="https://gitlab.com/gpaw/gpaw">GPAW GitLab</a></li>
<li><a href="https://gitlab.com/mlouhivu/gpaw/tree/cuda">GPAW GPU development (cuda branch)</a></li>
<li><a href="gpaw/README.md#mechanics-of-building-the-benchmark">Build instructions</a>
<li><a href="gpaw/README.md#mechanics-of-running-the-benchmark">Run instructions</a>
<li><a href="gpaw/benchmark/A_carbon-nanotube/input.py">Test Case A</a>
<li><a href="gpaw/benchmark/B_copper-filament/input.py">Test Case B</a>
<li><a href="gpaw/benchmark/C_silicon-cluster/input.py">Test Case C</a>
</ul>
</td>
<td>132,000</td>
<td>X</td>
<td></td>
<td>(X)</td>
<td></td>
<td>X</td>
<td>X</td>
<td></td>
<td>
GPAW is a density-functional theory (DFT)
program for ab initio electronic structure calculations using the projector
augmented wave method. It uses a uniform real-space grid representation of the
electronic wavefunctions that allows for excellent computational scalability
and systematic converge properties.
The GPAW benchmark tests MPI parallelization and the quality of the provided mathematical
libraries, including BLAS, LAPACK, ScaLAPACK, and FFTW-compatible library. There is
also an experimental CUDA-based implementation for GPU systems, but it is not covered
by this UEABS release.
</td>
</tr>
<tr>
<td>GROMACS
<ul>
<li><a href="http://www.gromacs.org">website</a></li>
<li><a href="http://www.gromacs.org/Downloads">Source code</a></li>
<li><a href="https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/gromacs">Build and Run Instructions</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseA.tar.xz">Test Case A</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseB.tar.xz">Test Case B</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseC.tar.xz">Test Case C</a></li>
</ul>
</td>
<td>862,079</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
<td>GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.</td>
</tr>
<tr>
<td>NAMD
<ul>
<li><a href="http://www.ks.uiuc.edu/Research/namd/">website</a></li>
<li><a href="http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD">Source code</a></li>
<li><a href="https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/namd">Build and Run Instructions</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseA.tar.gz">Test Case A</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseB.tar.gz">Test Case B</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseC.tar.gz">Test Case C</a></li>
</ul>
</td>
<td>887,547</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>X</td>
<td>NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of compute platforms.
</td>
</tr>
<tr>
<td>NEMO
<ul>
<li><a href="https://www.nemo-ocean.eu/">website</a></li>
<li><a href="https://forge.ipsl.jussieu.fr/nemo/chrome/site/doc/NEMO/guide/html/install.html#download-and-install-the-nemo-code">source</a></li>
<li><a href="nemo/README.md">instructions</a></li>
<li><a href="nemo/README.md#verification-of-results">Test Case A</a></li>
<li><a href="nemo/README.md#verification-of-results">Test Case B</a></li>
</ul>
</td>
<td>154,240</td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td>NEMO (Nucleus for European Modelling of the Ocean) is a mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by a European consortium. It is intended to be a tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).</td>
</tr>
<tr>
<td>PFARM
<ul>
<li><a href="https://www.ccpq.ac.uk/node/4">website</a></li>
<li><a href="https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/pfarm">source</a></li>
<li><a href="pfarm/PFARM_Build_Run_README.txt">instructions</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/PFARM/2.2/test_case_1_atom.tar.gz">Test Case 1</a></li>
<li><a href="https://repository.prace-ri.eu/ueabs/PFARM/2.2/test_case_2_mol.tar.gz">Test Case 2</a></li>
</ul>
</td>
<td>21,434</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>PFARM uses an R-matrix ab-initio approach to calculate electron-atom and electron-molecule collisions data for a wide range of applications including atrophysics and nuclear fusion. It is written in modern Fortran/MPI/OpenMP and exploits highly-optimised dense linear algebra numerical library routines.</td>
</tr>
<tr>
<td>QCD
<ul>
<li><a href='qcd/README.md'>see for more details</a></li>
</ul>
</td>
<td>100,000+</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
<td>The QCD benchmark is, unlike the other benchmarks in the PRACE application benchmark suite, not a full application but a set of 3 parts which are representative of some of the most compute-intensive parts of QCD calculations. The major application of the different parts consists of a Conjugate Gradient solver involving Wilson Dirac stencil in 4 dimension. Keywords of the QCD bencharmks kernels are: Domain Decomposition, Memory bandwidth, strong scaling, MPI latency.</td>
</tr>
<tr>
<td>Quantum&nbsp;ESPRESSO
<ul>
<li><a href='https://www.quantum-espresso.org/'>Website</a></li>
<li><a href='https://www.quantum-espresso.org/download-page/'>Source</a></li>
<li><a href='quantum_espresso/README.md#installation-and-requirements'>Build and Run instructions</a></li>
<li><a href='https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz'>Test Case A</a></li>
<li><a href='https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseB.tar.gz'>Test Case B</a></li>
</ul>
</td>
<td>92,996</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td>Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. It is written in MPI and OpenMP with a CUDA Fortran version
available for Nvidia GPUs. In the benchmark suite we consider only the most used program, PWscf. </td>
</tr>
<tr>
<td>SPECFEM3D
<ul>
<li><a href="https://geodynamics.org/cig/software/specfem3d_globe/">Website</a></li>
<li><a href="https://github.com/geodynamics/specfem3d_globe.git">Source</a></li>
<li><a href="specfem3D/README.md">Run and build instructions</a></li>
<li><a href="specfem3D/test_case/SPECFEM3D_TestCase_A">Test Case A</a></li>
<li><a href="specfem3D/test_case/SPECFEM3D_TestCase_B">Test Case B</a></li>
<li><a href="https://github.com/geodynamics/specfem3d_globe/tree/master/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth">Test Case C</a></li>
</ul>
</td>
<td> ~120,000 (100k Fortran & 20k C)</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td>X</td>
<td></td>
<td>The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM).</td>
</tr>
<tr>
<td>TensorFlow
<ul>
<li><a href="https://www.tensorflow.org/">website</a></li>
<li><a href="https://github.com/maxwelltsai/DeepGalaxy">source</a></li>
<li><a href="tensorflow/README.md">instructions</a></li>
<li><a href="tensorflow/Testcase_A">Test Case A</a></li>
<li><a href="tensorflow/Testcase_B">Test Case B</a></li>
<li><a href="tensorflow/Testcase_C">Test Case C</a></li>
</ul>
</td>
<td>~3,000,000</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>TensorFlow is a popular open-source library for symbolic math and linear algebra, with particular optimisation for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry.</td>
</tr>
</tbody>
</table>
License
-------
All UEABS application codes are covered by their own respective licenses. Code modificiations required by the UEABS might inherit the originating application's license.
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />Unless stated otherwise, all work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
Acknowledgements
----------------
<img alt="Co-Funded by the European Union" src="EN_Co-Funded_by_the_EU_POS.png" width="300px">
This project has received funding from the European Union’s
Seventh Framework Programme (FP7/2007-2013) under grant agreements
n° 211528 (PRACE-PP), n° 261557 (PRACE-1IP), n° 283493 (PRACE-2IP), n° 312763 (PRACE-3IP);
and from the
Horizon 2020 research and innovation programme under grant agreements
No 653838 (PRACE-4IP), No 730913 (PRACE-5IP), No 823767 (PRACE-6IP).
# UEABS Releases
## Version 2.2 (PRACE-6IP, December 31, 2021)
* Changed the presentation, making it similar to the CORAL Benchmarks (cf. <a href="https://asc.llnl.gov/coral-benchmarks">CORAL Benchmarks</a> and <a href="https://asc.llnl.gov/coral-2-benchmarks">CORAL-2 Benchmarks</a>)
* Removed the SHOC benchmark suite.
* Added the TensorFlow benchmark.
* Alya: Updated to open-alya version. Updated build instructions.
* Code_Saturne: Updated to version 7.0, updated build instructions, and added larger test cases.
* CP2K: Updated to CP2K version 8.1 and updated build instructions.
* GPAW: Updated the medium and large benchmark cases to work with GPAW 20.1.0/20.10.0
and revised the build and run instructions as they have changed for these versions.
* NEMO: Updated build instructions for the NEMO v4.0 and XIOS v2.5. Added required architecture files for PRACE Tier-0 systems.
* Quantum Espresso: Updated download and build instructions. Note that now (free) registration is required to download the source code.
* Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021)
## Version 2.1 (PRACE-5IP, April 30, 2019)
* Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019)
......
Alya builds the makefile from the compilation options defined in config.in. In order to build ALYA (Alya.x), please follow these steps:
- Goto to directory: Executables/unix
- Edit config.in (some default config.in files can be found in directory configure.in):
- Select your own MPI wrappers and paths
- Select size of integers. Default is 4 bytes, For 8 bytes, select -DI8
- Choose your metis version, metis-4.0 or metis-5.1.0_i8 for 8-bytes integers
- Configure Alya: ./configure -x nastin parall
- Compile metis: make metis4 or make metis5
- Compile Alya: make
Data sets
---------
The parameters used in the datasets try to represent at best typical industrial runs in order to obtain representative speedups. For example, the iterative solvers are never converged to machine accuracy, but only as a percentage of the initial residual.
The different datasets are:
SPHERE_16.7M ... 16.7M sphere mesh
SPHERE_132M .... 132M sphere mesh
How to execute Alya with a given dataset
----------------------------------------
In order to run ALYA, you need at least the following input files per execution:
X.dom.dat
X.ker.dat
X.nsi.dat
X.dat
In our case X=sphere
To execute a simulation, you must be inside the input directory and you should submit a job like:
mpirun Alya.x sphere
How to measure the speedup
--------------------------
There are many ways to compute the scalability of Nastin module.
1. For the complete cycle including: element assembly + boundary assembly + subgrid scale assembly + solvers, etc.
2. For single kernels: element assembly, boundary assembly, subgrid scale assembly, solvers
3. Using overall times
1. In *.nsi.cvg file, column "30. Elapsed CPU time"
2. Single kernels. Here, average and maximum times are indicated in *.nsi.cvg at each iteration of each time step:
Element assembly: 19. Ass. ave cpu time 20. Ass. max cpu time
Boundary assembly: 33. Bou. ave cpu time 34. Bou. max cpu time
Subgrid scale assembly: 31. SGS ave cpu time 32. SGS max cpu time
Iterative solvers: 21. Sol. ave cpu time 22. Sol. max cpu time
Note that in the case of using Runge-Kutta time integration (the case of the sphere), the element and boundary assembly times are this of the last assembly of current time step (out of three for third order).
3. At the end of *.log file, total timings are shown for all modules. In this case we use the first value of the NASTIN MODULE.
Contact
-------
If you have any question regarding the runs, please feel free to contact Guillaume Houzeaux: guillaume.houzeaux@bsc.es
# ALYA
## Summary Version
1.0
## Purpose of Benchmark
The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP.
* Web site: https://www.bsc.es/computer-applications/alya-system
* Code download: https://gitlab.com/bsc-alya/open-alya
* Test Case A: https://gitlab.com/bsc-alya/benchmarks/sphere-16M
* Test Case B: https://gitlab.com/bsc-alya/benchmarks/sphere-132M
## Mechanics of Building Benchmark
You can compile alya using CMake. It follows the classic CMake configuration, except for the compiler management that has been customized by the developers.
### Creation of the build directory
In your alya directory, create a new build directory:
```
mkdir build
cd build
```
### Configuration
To configure cmake using the command line, type the following:
cmake ..
If you want to customize the build options, use -DOPTION=value. For example, to enable GPU as it follows:
cmake .. -DWITH_GPU=ON
### Compilation
make -j 8
For more information: https://gitlab.com/bsc-alya/alya/-/wikis/Documentation/Installation
## Mechanics of Running Benchmark
### Datasets
The parameters used in the datasets try to represent at best typical industrial runs in order to obtain representative speedups. For example, the iterative solvers are never converged to machine accuracy, but only as a percentage of the initial residual.
The different datasets are:
Test Case A: SPHERE_16.7M ... 16.7M sphere mesh
Test Case B: SPHERE_132M .... 132M sphere mesh
### How to execute Alya with a given dataset
In order to run ALYA, you need at least the following input files per execution:
X.dom.dat
X.ker.dat
X.nsi.dat
X.dat
In our case X=sphere
To execute a simulation, you must be inside the input directory and you should submit a job like:
mpirun Alya.x sphere
How to measure the performance
--------------------------
There are many ways to compute the scalability of Nastin module.
1. **For the complete cycle including: element assembly + boundary assembly + subgrid scale assembly + solvers, etc.**
> In *.nsi.cvg file, column "30. Elapsed CPU time"
2. **For single kernels: element assembly, boundary assembly, subgrid scale assembly, solvers**. Single kernels. Here, average and maximum times are indicated in *.nsi.cvg at each iteration of each time step:
> Element assembly: 19. Ass. ave cpu time 20. Ass. max cpu time
>
> Boundary assembly: 33. Bou. ave cpu time 34. Bou. max cpu time
>
> Subgrid scale assembly: 31. SGS ave cpu time 32. SGS max cpu time
>
> Iterative solvers: 21. Sol. ave cpu time 22. Sol. max cpu time
>
> Note that in the case of using Runge-Kutta time integration (the case
> of the sphere), the element and boundary assembly times are this of
> the last assembly of current time step (out of three for third order).
3. **Using overall times**.
> At the end of *.log file, total timings are shown for all modules. In
> this case we use the first value of the NASTIN MODULE.
Contact
-------
If you have any question regarding the runs, please feel free to contact Guillaume Houzeaux: guillaume.houzeaux@bsc.es
# Alya - Large Scale Computational Mechanics
Alya is a simulation code for high performance computational mechanics. Alya solves coupled multiphysics problems using high performance computing techniques for distributed and shared memory supercomputers, together with vectorization and optimization at the node level.
Homepage: https://www.bsc.es/research-development/research-areas/engineering-simulations/alya-high-performance-computational
Alya is avaialble to collaboratoring projects and a specific version is being distributed as part of the PRACE Unified European Applications Benchmark Suite (http://www.prace-ri.eu/ueabs/#ALYA)
## Building Alya for GPU accelerators
The library currently supports four solvers:GMRES, Deflated Conjugate Gradient, Conjugate Gradient, and Pipelined Conjugate Gradient.
The only pre-conditioner supported at the moment is 'diagonal'.
Keywords to use the solvers:
```shell
NINJA GMRES : GGMR
NINJA Deflated CG : GDECG
NINJA CG : GCG
NINJA Pipelined CG : GPCG
PRECONDITIONER : DIAGONAL
```
Other options are same a CPU based solver.
### GPGPU Building
This version was tested with the Intel Compilers 2017.1, bullxmpi-1.2.9.1 and NVIDIA CUDA 7.5. Ensure that the wrappers `mpif90` and `mpicc` point to the correct binaries and that `$CUDA_HOME` is set.
Alya can be used with just MPI or hybrid MPI-OpenMP parallelism. Standard execution mode is to rely on MPI only.
- Uncompress the source and configure the depending Metis library and Alya build options:
```shell
tar xvf alya-prace-acc.tar.bz2
```
- Edit the file `Alya/Thirdparties/metis-4.0/Makefile.in` to select the compiler and target platform. Uncomment the specific lines and add optimization parameters, e.g.
```shell
OPTFLAGS = -O3 -xCORE-AVX2
```
- Then build Metis4
```shell
$ cd Alya/Executables/unix
$ make metis4
```
- For Alya there are several example configurations, copy one, e.g. for Intel Compilers:
```shell
$ cp configure.in/config_ifort.in config.in
```
- Edit the config.in:
Add the corresponding platform optimization flags to `FCFLAGS`, e.g.
```shell
FCFLAGS = -module $O -c -xCORE-AVX2
```
- MPI: No changes in the configure file are necessary. By default you use metis4 and 4 byte integers.
- MPI-hybrid (with OpenMP) : Uncomment the following lines for OpenMP version:
```shell
CSALYA := $(CSALYA) -qopenmp (-fopenmp for GCC Compilers)
EXTRALIB := $(EXTRALIB) -qopenmp (-fopenmp for gcc Compilers)
```
- Configure and build Alya (-x Release version; -g Debug version, plus uncommenting debug and checking flags in config.in)
```shell
./configure -x nastin parall
make NINJA=1 -j num_processors
```
### GPGPU Usage
Each problem needs a `GPUconfig.dat`. A sample is available at `Alya/Thirdparties/ninja` and needs to be copied to the work directory. A README file in the same location provides further information.
- Extract the small one node test case and configure to use GPU solvers:
```shell
$ tar xvf cavity1_hexa_med.tar.bz2 && cd cavity1_hexa_med
$ cp ../Alya/Thirdparties/ninja/GPUconfig.dat .
```
- To use the GPU, you have to replace `GMRES` with `GGMR` and `DEFLATED_CG` with `GDECG`, both in `cavity1_hexa.nsi.dat`
- Edit the job script to submit the calculation to the batch system.
```shell
job.sh: Modify the path where you have your Alya.x (compiled with MPI options)
sbatch job.sh
```
Alternatively execute directly:
```shell
OMP_NUM_THREADS=4 mpirun -np 16 Alya.x cavity1_hexa
```
<!-- Runtime on 16-core Xeon E5-2630 v3 @ 2.40GHz with 2 NVIDIA K80: ~1:30 min -->
<!-- Runtime on 16-core Xeon E5-2630 v3 @ 2.40GHz no GPU: ~2:00 min -->
## Building Alya for Intel Xeon Phi Knights Landing (KNL)
The Xeon Phi processor version of Alya is currently relying on compiler assisted optimization for AVX-512. Porting of performance critical kernels to the new assembly instructions is underway. There will not be a version for first generation Xeon Phi Knights Corner coprocessors.
### KNL Building
This version was tested with the Intel Compilers 2017.1, Intel MPI 2017.1. Ensure that the wrappers `mpif90` and `mpicc` point to the correct binaries.
Alya can be used with just MPI or hybrid MPI-OpenMP parallelism. Standard execution mode is to rely on MPI only.
- Uncompress the source and configure the depending Metis library and Alya build options:
```shell
tar xvf alya-prace-acc.tar.bz2
```
- Edit the file `Alya/Thirdparties/metis-4.0/Makefile.in` to select the compiler and target platform. Uncomment the specific lines and add optimization parameters, e.g.
```shell
OPTFLAGS = -O3 -xMIC-AVX512
```
- Then build Metis4
```shell
$ cd Alya/Executables/unix
$ make metis4
```
- For Alya there are several example configurations, copy one, e.g. for Intel Compilers:
```shell
$ cp configure.in/config_ifort.in config.in
```
- Edit the config.in:
Add the corresponding platform optimization flags to `FCFLAGS`, e.g.
```shell
FCFLAGS = -module $O -c -xMIC-AVX512
```
- MPI: No changes in the configure file are necessary. By default you use metis4 and 4 byte integers.
- MPI-hybrid (with OpenMP) : Uncomment the following lines for OpenMP version:
```shell
CSALYA := $(CSALYA) -qopenmp (-fopenmp for GCC Compilers)
EXTRALIB := $(EXTRALIB) -qopenmp (-fopenmp for gcc Compilers)
```
- Configure and build Alya (-x Release version; -g Debug version, plus uncommenting debug and checking flags in config.in)
```shell
./configure -x nastin parall
make -j num_processors
```
## Remarks
If the number of elements is too low for a scalability analysis, Alya includes a mesh multiplication technique. This tool can be used by selecting an input option in the ker.dat file. This option is the number of mesh multiplication levels one wants to apply (0 meaning no mesh multiplication). At each multiplication level, the number of elements is multiplied by 8, so one can obtain a huge mesh automatically in order to study the scalability of the code on different architectures. Note that the mesh multiplication is carried out in parallel and thus should not impact the duration of the simulation process.
#!/bin/bash
#
# Read file timer_stats.csv
#
#
export FILE_LENGTH=`wc -l < timer_stats.csv`
#
## echo "Number of lines $FILE_LENGTH"
#
export TAIL_LINE_NUMBER="$(($FILE_LENGTH-4))"
#
## echo $TAIL_LINE_NUMBER
#
tail -$TAIL_LINE_NUMBER timer_stats.csv > timer_1st.tmp
#
##more timer_1st.tmp
#
awk '{print $2}' timer_1st.tmp > timer_2nd.tmp
#
sed 's/,//g' timer_2nd.tmp > timer_1st.tmp
#
export FILE_LENGTH=`wc -l < timer_1st.tmp`
#
## echo "Number of lines $FILE_LENGTH"
#
export FILE_LENGTH=$(($FILE_LENGTH-1))
#
export HEAD_LINE_NUMBER="-$FILE_LENGTH"
#
head $HEAD_LINE_NUMBER timer_1st.tmp > timer_2nd.tmp
#
export sum_of_lines=`awk '{s+=$1}END{print s}' timer_2nd.tmp`
## echo "Sum of the lines of the file: $sum_of_lines"
#
##more timer_2nd.tmp
#
export average_timing=`echo "$sum_of_lines / $FILE_LENGTH" | bc -l`
echo "Averaged timing for the $FILE_LENGTH entries: $average_timing"
#
rm -rf *.tmp
#!/bin/sh
#################################
## Which version of the code ? ##
#################################
CODE_VERSION=7.0.0
KER_VERSION=${CODE_VERSION}
KERNAME=code_saturne-${KER_VERSION}
################################################
## Installation PATH in the current directory ##
################################################
INSTALLPATH=`pwd`
echo $INSTALLPATH
#####################################
## Environment variables and PATHS ##
#####################################
NOM_ARCH=`uname -s`
CS_HOME=${INSTALLPATH}/${KERNAME}
export PATH=$CS_HOME/bin:$PATH
##############
## Cleaning ##
##############
rm -rf $CS_HOME/arch/*
rm -rf $INSTALLPATH/$KERNAME.build
#########################
## Kernel Installation ##
#########################
KERSRC=$INSTALLPATH/$KERNAME
KERBUILD=$INSTALLPATH/$KERNAME.build/arch/$NOM_ARCH
KEROPT=$INSTALLPATH/$KERNAME/arch/$NOM_ARCH
export KEROPT
mkdir -p $KERBUILD
cd $KERBUILD
$KERSRC/configure \
--disable-shared \
--disable-nls \
--without-modules \
--disable-gui \
--enable-long-gnum \
--disable-mei \
--enable-debug \
--prefix=$KEROPT \
CC="mpicc" CFLAGS="-O3" FC="mpif90" FCFLAGS="-O3" CXX="mpicxx" CXXFLAGS="-O3"
make -j 8
make install
cd $INSTALLPATH
# Code_Saturne
Code_Saturne is open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to account for other physics, such as conjugate heat transfer and structure mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi.
[Code_Saturne](https://www.code-saturne.org/cms/) is an open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A new discretisation based on the Compatible Discrete Operator (CDO) approach can be used for some physics. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to couple other software with different physics, such as for conjugate heat transfer and structural mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the velocity components/scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi.
The original version of the code is written in C for pre-postprocessing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for most of the physics implementation. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to handle potential shared memory. The version used in this work (also freely available) relies on CUDA to take advantage of potential GPU acceleration.
The original version of the code is written in C for pre-/post-processing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for some of the physics-related implementation. Python is used to manage the simulations. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to be used on shared memory architectures. The version used in this work relies on external libraries (AMGx - PETSc) to take advantage of potential GPU acceleration.
The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is usually due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the two test cases ([https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/Code_Saturne_Build_Run_5.3_UEABS.pdf](CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz) and [https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/Code_Saturne_Build_Run_5.3_UEABS.pdf](CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz)) chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, with a mesh 8 times larger for CAVITY_111M than for CAVITY_13M, the time step being halved to ensure a correct Courant number.
The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the test cases chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, the 3-D lid-driven cavity, using tetrahedral cell meshes. The first case mesh contains over 13 million cells. The larger test cases are modular in the sense that mesh multiplication is used on-the-fly to increase their mesh size, using several level of refinements.
## Building and running the code is described in the file
[Code_Saturne_Build_Run_5.3_UEABS.pdf](Code_Saturne_Build_Run_5.3_UEABS.pdf)
## Building Code_Saturne v7.0.0
The version 7.0.0 of Code_Saturne is to be found [here](https://www.code-saturne.org/cms/sites/default/files/releases/code_saturne-7.0.0.tar.gz).
## The test cases are to be found under:
https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz
https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz
A simple installer [_InstallHPC.sh_](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/InstallHPC.sh) is made available for this version.
## The distribution is to be found under:
https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS.tar.gz
An example of the last lines of the installer (meant for the GNU compiler & MPI-OpenMP in this example) reads:\
$KERSRC/configure \\ \
--disable-shared \\ \
--disable-nls \\ \
--without-modules \\ \
--disable-gui \\ \
--enable-long-gnum \\ \
--disable-mei \\ \
--enable-debug \\ \
--prefix=$KEROPT \\ \
CC="mpicc" CFLAGS="-O3" FC="mpif90" FCFLAGS="-O3" CXX="mpicxx" CXXFLAGS="-O3" \
\# \
make -j 8 \
make install
CC, FC, CFLAGS, FCFLAGS, LDFLAGS and LIBS might have to be tailored for your machine, compilers, MPI installation, etc.
More information concerning the options can be found by typing: ./configure --help
Assuming that CS_7.0.0_PRACE_UEABS is the current directory, the tarball is untarred in there as: \
tar zxvf code_saturne-7.0.0.tar.gz
and the code is then installed as:
cd CS_7.0.0_PRACE_UEABS \
./InstallHPC.sh
If the installation is successful the command **code_saturne** should return, when typing:\
YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne
Usage: ./code_saturne <topic>
Topics: \
help \
studymanager \
smgr \
bdiff \
bdump \
compile \
config \
cplgui \
create \
gui \
parametric \
studymanagergui \
smgrgui \
trackcvg \
update \
up \
info \
run \
submit \
symbol2line
Options: \
-h, --help show this help message and exit
## Preparing a simulation.
Two archives are used, namely [**CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz**](https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz) and [**CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz**](https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz) that contain the information required to run both test cases, with the mesh_input.csm file (for the mesh) and the usersubroutines in _src_.
Taking the example of CAVITY_13M, from the working directory WORKDIR (different from CS_7.0.0_PRACE_UEABS), a ‘study’ has to be created (CAVITY_13M, for instance) as well as a ‘case’ (MACHINE, for instance) as:
YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne create --study CAVITY_13M --case MACHINE --copy-ref
The directory **CAVITY_13M** contains 3 directories, MACHINE, MESH and POST.
The directory **MACHINE** contains 3 directories, DATA, RESU and SRC.
The file mesh_input.csm should be copied into the MESH directory.
The user subroutines (cs_user* files) contained in _src_ should be copied into SRC.
The file _cs_user_scripts.py_ is used to manage the simulation. It has to be copied to DATA as: \
cd DATA \
cp REFERENCE/cs_user_scripts.py . \
At Line 89 of this file, you need to change from None to the local path of the mesh, i.e. "../MESH/mesh_input.csm”
To finalise the preparation go to the folder MACHINE and type: \
YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne run --initialize
This should create a folder RESU/YYYYMMDD-HHMM, which should contain the following flles:
- compile.log
- cs_solver
- cs_user_scripts.py
- listing
- mesh_input.csm
- run.cfg
- run_solver
- setup.xml
- src
- summary
## Running Code_Saturne v7.0.0
The name of the executable is ./cs_solver and, the code should be run as mpirun/mpiexec/poe/aprun/srun ./cs_solver
## Example of timing
A script is used to compute the average time per time step, e.g. [_CS_collect_timing.sh_](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/CS_collect_timing.sh), which returns:
Averaged timing for the 97 entries: 2.82014432989690721649
for the case of the CAVITY_13M, run on 2 nodes of a Cray - AMD (Rome).
## Larger cases
The same steps are carried for the larger cases using the CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz file.
These cases are built by mesh multiplication (also called global refinement) of the mesh used for CAVITY_13M.
If 1 (resp. 2 or 3) level(s) of refinement is/are used, the mesh is over 111M (resp. 889M or 7112M) cells large. The
third mesh (level 3) is definitely suitable to run using over 100,000 MPI tasks.
To make sure that the simulations are stable, the time step is adjusted depending on the refinement level used.
The number of levels of refinement is set at Line 152 of the _cs_user_mesh.c_ file, by chosing tot_nb_mm as
1, 2 or 3.\
The time step is set at Line 248 of the _cs_user_parameter.f90_ file, by chosing 0.01d0 / 3.d0 (level 1), 0.01d0 / 9.d0
(level 2) or 0.01d0 / 27.d0. \
The table below recalls the correct settings.
| | At Line 152 of _cs_user_mesh.c_ | At Line 248 of _cs_user_parameter.f90_ |
| ------ | ------ | ------ |
| Level 1 | tot_nb_mm = 1 | dtref = 0.01d0 / 3.d0 |
| Level 2 | tot_nb_mm = 2 | dtref = 0.01d0 / 9.d0 |
| Level 3 | tot_nb_mm = 3 | dtref = 0.01d0 / 27.d0 |
CC = mpicc -fopenmp
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DATA_DIR = /zhome/academic/HLRS/pri/iprhjud/CP2K/cp2k-8.1/data
CP2K_ROOT = /zhome/academic/HLRS/pri/iprhjud/CP2K
MKL_LIB = ${MKLROOT}/lib/intel64
# Options
DFLAGS = -D__FFTW3 -D__LIBXC -D__MKL \
-D__LIBINT -D__MAX_CONTR=4 -D__ELPA=202005 \
-D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
-D__STATM_RESIDENT
CFLAGS = -O3 -mavx -funroll-loops -ftree-vectorize \
-ffree-form -march=znver2 -mtune=znver2 -fno-math-errno
FCFLAGS = $(DFLAGS) $(CFLAGS) \
-I$(CP2K_ROOT)/libs/libint/include \
-I$(CP2K_ROOT)/libs/libxc/include \
-I$(MKLROOT)/include \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
LDFLAGS = $(FCFLAGS)
LIBS = -L$(CP2K_ROOT)/libs/libint/lib -lint2 \
-L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
-L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
-lfftw3 -lfftw3_threads -lz \
$(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
$(MKL_LIB)/libmkl_gf_lp64.a $(MKL_LIB)/libmkl_sequential.a \
$(MKL_LIB)/libmkl_core.a \
$(MKL_LIB)/libmkl_blacs_sgimpt_lp64.a -Wl,--end-group \
-ldl -lpthread -lm -lstdc++
# Irene ARCH file
# module load feature/openmpi/mpi_compiler/gcc
# module load flavor/openmpi
# module load gnu/8.3.0
# module load mkl
CC = mpicc
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DATA_DIR = /ccc/work/cont005/pa5489/judgehol/CP2K/cp2k-8.1/data
CP2K_ROOT = /ccc/work/cont005/pa5489/judgehol/CP2K
MKL_LIB = ${MKLROOT}/lib/intel64
# Options
DFLAGS = -D__FFTW3 -D__MKL -D__LIBXSMM \
-D__LIBINT -D__MAX_CONTR=4 -D__LIBXC \
-D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
-D__STATM_RESIDENT
CFLAGS = -O2 -g -funroll-loops -ftree-vectorize -std=f2008 \
-ffree-form -mtune=native -fno-math-errno -ffree-line-length-none
FCFLAGS = $(DFLAGS) $(CFLAGS) \
-I$(CP2K_ROOT)/libs/libint/include \
-I$(MKLROOT)/include -m64 \
-I$(CP2K_ROOT)/libs/libxsmm/include \
-I$(CP2K_ROOT)/libs/libxc/include \
-I$(CP2K_ROOT)/libs/fftw/include
LDFLAGS = $(FCFLAGS)
LIBS = -L$(CP2K_ROOT)/libs/libint/lib -lint2 \
-L$(CP2K_ROOT)/libs/libxsmm/lib -lxsmmf -lxsmm -lxsmmext \
-L$(CP2K_ROOT)/libs/fftw/lib -lfftw3 -lfftw3_threads -lz -ldl -lstdc++ \
$(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
$(MKL_LIB)/libmkl_gf_lp64.a ${MKL_LIB}/libmkl_sequential.a \
$(MKL_LIB)/libmkl_core.a \
${MKL_LIB}/libmkl_blacs_openmpi_lp64.a -Wl,--end-group \
-lpthread -lm
# CP2K arch file for Juwels psmp
# module load GCC, ParastationMPI/5.2.2-1 FFTW/3.3.8 imkl/2019.5.281
CC = mpicc
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DATA_DIR = /p/project/prpb92/CP2K/cp2k-8.1/data
CP2K_ROOT = /p/project/prpb92/CP2K
MKL_LIB = ${MKLROOT}/lib/intel64
DFLAGS = -D__FFTW3 -D__MKL -D__ELPA=202005 \
-D__LIBINT -D__MAX_CONTR=4 -D__LIBXC -D__LIBXSMM \
-D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
-D__STATM_RESIDENT
CFLAGS = -O3 -mavx -funroll-loops -ftree-vectorize \
-ffree-form -mtune=native -fno-math-errno
FCFLAGS = $(DFLAGS) $(CFLAGS) \
-I$(CP2K_ROOT)/libs/libint/include \
-I$(MKLROOT)/include -m64 \
-I$(CP2K_ROOT)/libs/libxsmm/include \
-I$(CP2K_ROOT)/libs/libxc/include \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
LDFLAGS = $(FCFLAGS)
LIBS = -L$(CP2K_ROOT)/libs/libint/lib -lint2 \
-L$(CP2K_ROOT)/libs/libxsmm/lib -lxsmmf -lxsmm -lxsmmext \
-L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
-L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
$(PLUMED_DEPENDENCIES) -lfftw3 -lfftw3_threads -lz -ldl -lstdc++ \
$(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
$(MKL_LIB)/libmkl_gf_lp64.a ${MKL_LIB}/libmkl_sequential.a \
$(MKL_LIB)/libmkl_core.a \
${MKL_LIB}/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group \
-lpthread -lm
NVCC = ${CUDA_PATH}/bin/nvcc
CC = gcc
CXX = g++
FC = mpif90
LD = mpif90
AR = ar -r
GPUVER = V100
CUDAPATH = /cineca/prod/opt/compilers/cuda/11.0/none
CXXFLAGS = -O3 -I$(CUDAPATH)/include -std=c++11 -fopenmp
DATA_DIR = /m100_work/Ppp4x_5489/CP2K/cp2k-8.1/data
CP2K_ROOT = /m100_work/Ppp4x_5489/CP2K
LIBINT_INC = $(CP2K_ROOT)/libs/libint/include
LIBINT_LIB = $(CP2K_ROOT)/libs/libint/lib
LIBXC_INC = $(CP2K_ROOT)/libs/libxc/include
LIBXC_LIB = $(CP2K_ROOT)/libs/libxc/lib
DFLAGS = -D__FFTW3 -D__ACC -D__DBCSR_ACC -D__SCALAPACK -D__PW_CUDA -D__parallel -D__LIBINT -D__MPI_VERSION=3 -D__LIBXC -D__GFORTRAN
FCFLAGS = -fopenmp -std=f2008 -fimplicit-none -ffree-form -fno-omit-frame-pointer -O3 -ftree-vectorize $(DFLAGS) $(WFLAGS)
FCFLAGS += -I$(LIBINT_INC) -I$(LIBXC_INC)
LDFLAGS = -L$(CUDAPATH)/lib64 $(FCFLAGS)
NVFLAGS = $(DFLAGS) -O3 -arch sm_70 -Xcompiler='-fopenmp' --std=c++11
CFLAGS = $(DFLAGS) -I$(LAPACK_INC) -I${FFTW_INC} -fno-omit-frame-pointer -g -O3 -fopenmp
LIBS = -L${LAPACK_LIB} -L${BLAS_LIB} -L${FFTW_LIB} -L${CUDA_LIB} -L${SCALAPACK_LIB} -lscalapack -llapack -lblas -lstdc++ -lfftw3 -lfftw3_omp -lcuda -lcudart -lnvrtc -lcufft -lcublas -lrt
LIBS += $(LIBINT_LIB)/libint2.a
LIBS += $(LIBXC_LIB)/libxcf03.a $(LIBXC_LIB)/libxc.a
# CP2K arch file for Marenostrum psmp
# module unload intel impi
# module load gnu/8.4.0
# module load openmpi/4.0.2
# module load mkl/2018.4
CC = mpicc -fopenmp
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DATA_DIR = /gpfs/scratch/pr1emd00/pr1emd01/CP2K/cp2k-8.1/data
CP2K_ROOT = /gpfs/scratch/pr1emd00/pr1emd01/CP2K
MKL_LIB = ${MKLROOT}/lib/intel64
FFTW_LIB = /gpfs/scratch/pr1emd00/pr1emd01/CP2K/libs/fftw
# Options
DFLAGS = -D__FFTW3 -D__LIBXC -D__MKL \
-D__LIBINT -D__MAX_CONTR=4 -D__ELPA=202005 \
-D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
-D__STATM_RESIDENT
CFLAGS = -O3 -mavx -funroll-loops -ftree-vectorize \
-ffree-form -march=skylake-avx512 -fno-math-errno
FCFLAGS = $(DFLAGS) $(CFLAGS) \
-I$(CP2K_ROOT)/libs/libint/include \
-I$(CP2K_ROOT)/libs/libxc/include \
-I$(MKLROOT)/include \
-I$(FFTW_LIB)/include \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
LDFLAGS = $(FCFLAGS)
LIBS = -L$(CP2K_ROOT)/libs/libint/lib -lint2 \
-L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
-L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
$(FFTW_LIB)/lib/libfftw3.a $(FFTW_LIB)/lib/libfftw3_threads.a -lz \
$(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
$(MKL_LIB)/libmkl_gf_lp64.a $(MKL_LIB)/libmkl_sequential.a \
$(MKL_LIB)/libmkl_core.a \
$(MKL_LIB)/libmkl_blacs_openmpi_lp64.a -Wl,--end-group \
-ldl -lpthread -lm -lstdc++
# modules: CrayGNU cray-fftw cray-python
CC = cc
CPP =
FC = ftn
LD = ftn
AR = ar -r
CP2K_ROOT=/scratch/snx3000/hjudge/CP2K/build-cpu
DFLAGS = -D__FFTW3 -D__parallel -D__SCALAPACK -D__LIBINT -D__GFORTRAN -D__ELPA -D__LIBXC
CFLAGS = $(DFLAGS) -g -O3 -mavx -fopenmp -march=native -mtune=native
CXXFLAGS = $(CFLAGS)
FCFLAGS = $(DFLAGS) -O3 -mavx -fopenmp -funroll-loops -ftree-vectorize -ffree-form -ffree-line-length-512 -march=native -mtune=native
FCFLAGS += -I$(CP2K_ROOT)/libs/libint/include
FCFLAGS += -I$(CP2K_ROOT)/libs/libxc/include
FCFLAGS += -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
LDFLAGS = $(FCFLAGS)
LIBS = -lfftw3 -lfftw3_threads
LIBS += -L$(CP2K_ROOT)/libs/libint/lib -lint2 -lstdc++
LIBS += -L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc
LIBS += -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp
# modules: CrayGNU cray-fftw cray-python cudatoolkit
GPUVER = P100
NVCC = nvcc
CC = cc
CPP =
FC = ftn
LD = ftn
AR = ar -r
CP2K_ROOT=/scratch/snx3000/hjudge/CP2K/build
DFLAGS = -D__FFTW3 -D__parallel -D__SCALAPACK -D__ACC -D__DBCSR_ACC -D__LIBINT -D__GFORTRAN -D__HAS_smm_dnn -D__LIBXC -D__ELPA
CFLAGS = $(DFLAGS) -I$(CRAY_CUDATOOLKIT_DIR)/include -g -O3 -mavx -fopenmp
CXXFLAGS = $(CFLAGS)
FCFLAGS = $(DFLAGS) -O3 -mavx -fopenmp -funroll-loops -ftree-vectorize -ffree-form -ffree-line-length-512
FCFLAGS += -I$(CP2K_ROOT)/libs/libint/include
FCFLAGS += -I$(CP2K_ROOT)/libs/libxc/include -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
LDFLAGS = $(FCFLAGS)
NVFLAGS = $(DFLAGS) -O3 -arch sm_60
LIBS = -lfftw3 -lfftw3_threads -lcudart -lcublas -lcufft -lrt -lnvrtc
LIBS += -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp
LIBS += -L$(CP2K_ROOT)/libs/libint/lib -lint2 -lstdc++
LIBS += -L$(CP2K_ROOT)/libs/libxc/lib -lxcf03 -lxc
LIBS += /apps/common/UES/easybuild/sources/c/CP2K/libsmm_dnn_cray.gnu.a
# SuperMUC-NG arch file
# module swap devEnv/Intel/2019 devEnv/GCC/8-IntelMPI
# module swap mpi.intel openmpi/4.0.2
# module load mkl/2019_gcc
CC = mpicc -fopenmp
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DATA_DIR = /hppfs/work/pn68ho/di67kis/CP2K/cp2k-8.1/data
CP2K_ROOT = /hppfs/work/pn68ho/di67kis/CP2K
MKL_LIB = ${MKLROOT}/lib/intel64
# Options
DFLAGS = -D__FFTW3 -D__MKL -D__LIBXC \
-D__LIBINT -D__LIBXSMM -D__ELPA=202005 -D__MAX_CONTR=4 \
-D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
-D__STATM_RESIDENT
CFLAGS = -O3 -mavx -funroll-loops -ftree-vectorize \
-ffree-form -march=native -fno-math-errno \
-I$(CP2K_ROOT)/libs/libxsmm/include
FCFLAGS = $(DFLAGS) $(CFLAGS) \
-I$(CP2K_ROOT)/libs/libint/include \
-I$(CP2K_ROOT)/libs/libxc/include \
-I$(MKLROOT)/include -m64 \
-I$(CP2K_ROOT)/libs/libxsmm/include \
-I$(CP2K_ROOT)/libs/fftw/include \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
-I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
LDFLAGS = $(FCFLAGS)
LIBS = -L$(CP2K_ROOT)/libs/libint/lib -lint2 \
-L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
-L$(CP2K_ROOT)/libs/libxsmm/lib -lxsmmf -lxsmm -lxsmmext \
-L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
-L$(CP2K_ROOT)/libs/fftw/lib -lfftw3 -lfftw3_threads -lz -ldl -lstdc++ \
$(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
$(MKL_LIB)/libmkl_gf_lp64.a $(MKL_LIB)/libmkl_sequential.a \
$(MKL_LIB)/libmkl_core.a \
$(MKL_LIB)/libmkl_blacs_openmpi_lp64.a -Wl,--end-group \
-lpthread -lm