c47fffbf · 095c691b · 3d5b3761 · 6adc11e4 · 367f0c3e · e7d7e711
--- a/EN_Co-Funded_by_the_EU_POS.png
+++ b/EN_Co-Funded_by_the_EU_POS.png
--- a/PRACE_logo.png
+++ b/PRACE_logo.png
--- a/README.md
+++ b/README.md
--- a/RELEASES.md
+++ b/RELEASES.md
 # UEABS Releases

+## Version 2.2 (PRACE-6IP, December 31, 2021)
+* Changed the presentation, making it similar to the CORAL Benchmarks (cf. <a href="https://asc.llnl.gov/coral-benchmarks">CORAL Benchmarks</a> and <a href="https://asc.llnl.gov/coral-2-benchmarks">CORAL-2 Benchmarks</a>)
+* Removed the SHOC benchmark suite.
+* Added the TensorFlow benchmark.
+* Alya: Updated to open-alya version. Updated build instructions.
+* Code_Saturne: Updated to version 7.0, updated build instructions, and added larger test cases.
+* CP2K: Updated to CP2K version 8.1 and updated build instructions.
+* GPAW: Updated the medium and large benchmark cases to work with GPAW 20.1.0/20.10.0
+  and revised the build and run instructions as they have changed for these versions.
+* NEMO: Updated build instructions for the NEMO v4.0 and XIOS v2.5. Added required architecture files for PRACE Tier-0 systems.
+* Quantum Espresso: Updated download and build instructions. Note that now (free) registration is required to download the source code.
+* Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021)
+
 ## Version 2.1 (PRACE-5IP, April 30, 2019)

 * Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019)

--- a/alya/ALYA_Build_README.txt
+++ b/alya/ALYA_Build_README.txt
-Alya builds the makefile from the compilation options defined in config.in. In order to build ALYA (Alya.x), please follow these steps:
-
-   - Goto to directory: Executables/unix
-   - Edit config.in (some default config.in files can be found in directory configure.in):
-     - Select your own MPI wrappers and paths
-     - Select size of integers. Default is 4 bytes, For 8 bytes, select -DI8
-     - Choose your metis version, metis-4.0 or metis-5.1.0_i8 for 8-bytes integers
-   - Configure Alya: ./configure -x nastin parall
-   - Compile metis:  make metis4 or make metis5
-   - Compile Alya:   make
--- a/alya/ALYA_Run_README.txt
+++ b/alya/ALYA_Run_README.txt
-Data sets
---------
-
-The parameters used in the datasets try to represent at best typical industrial runs in order to obtain representative speedups. For example, the iterative solvers are never converged to machine accuracy, but only as a percentage of the initial residual. 
-
-The different datasets are:
-
-SPHERE_16.7M ... 16.7M sphere mesh 
-SPHERE_132M .... 132M sphere mesh
-
-How to execute Alya with a given dataset
----------------------------------------
-
-In order to run ALYA, you need at least the following input files per execution:
-
-X.dom.dat
-X.ker.dat
-X.nsi.dat
-X.dat
-
-In our case X=sphere
-
-To execute a simulation, you must be inside the input directory and you should submit a job like:
-
-mpirun Alya.x sphere
-
-How to measure the speedup
--------------------------
-
-There are many ways to compute the scalability of Nastin module.
-
-1. For the complete cycle including: element assembly + boundary assembly + subgrid scale assembly + solvers, etc.
-
-2. For single kernels: element assembly, boundary assembly, subgrid scale assembly, solvers
-
-3. Using overall times
-
-
-1. In *.nsi.cvg file, column "30. Elapsed CPU time"
-
-
-2. Single kernels. Here, average and maximum times are indicated in *.nsi.cvg at each iteration of each time step:
-
-Element assembly: 19. Ass. ave cpu time    20. Ass. max cpu time    
-
-Boundary assembly: 33. Bou. ave cpu time  34. Bou. max cpu time   
-
-Subgrid scale assembly: 31. SGS ave cpu time     32. SGS max cpu time
-
-Iterative solvers: 21. Sol. ave cpu time     22. Sol. max cpu time   
-
-Note that in the case of using Runge-Kutta time integration (the case of the sphere), the element and boundary assembly times are this of the last assembly of current time step (out of three for third order). 
-
-3. At the end of *.log file, total timings are shown for all modules. In this case we use the first value of the NASTIN MODULE.
-
-Contact
-------
-
-If you have any question regarding the runs, please feel free to contact Guillaume Houzeaux: guillaume.houzeaux@bsc.es
--- a/alya/README.md
+++ b/alya/README.md
+# ALYA
+
+
+## Summary Version
+
+1.0
+
+## Purpose of Benchmark
+
+The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP.
+
+* Web site: https://www.bsc.es/computer-applications/alya-system
+
+* Code download: https://gitlab.com/bsc-alya/open-alya
+
+* Test Case A: https://gitlab.com/bsc-alya/benchmarks/sphere-16M
+
+* Test Case B: https://gitlab.com/bsc-alya/benchmarks/sphere-132M
+
+
+## Mechanics of Building Benchmark
+
+You can compile alya using CMake. It follows the classic CMake configuration, except for the compiler management that has been customized by the developers.
+
+### Creation of the build directory
+
+In your alya directory, create a new build directory:
+
+
+```
+mkdir build
+cd build
+```
+
+### Configuration
+
+To configure cmake using the command line, type the following:
+
+	cmake ..
+
+If you want to customize the build options, use -DOPTION=value. For example, to enable GPU as it follows:
+
+	cmake .. -DWITH_GPU=ON
+
+### Compilation 
+
+
+	make -j 8
+    
+For more information: https://gitlab.com/bsc-alya/alya/-/wikis/Documentation/Installation
+
+## Mechanics of Running Benchmark
+
+### Datasets
+
+The parameters used in the datasets try to represent at best typical industrial runs in order to obtain representative speedups. For example, the iterative solvers are never converged to machine accuracy, but only as a percentage of the initial residual. 
+
+The different datasets are:
+
+    Test Case A: SPHERE_16.7M ... 16.7M sphere mesh
+    Test Case B: SPHERE_132M .... 132M sphere mesh
+
+### How to execute Alya with a given dataset
+
+In order to run ALYA, you need at least the following input files per execution:
+
+    X.dom.dat
+    X.ker.dat
+    X.nsi.dat
+    X.dat
+
+In our case X=sphere
+
+To execute a simulation, you must be inside the input directory and you should submit a job like:
+
+    mpirun Alya.x sphere
+
+How to measure the performance
+--------------------------
+
+There are many ways to compute the scalability of Nastin module.
+
+1. **For the complete cycle including: element assembly + boundary assembly + subgrid scale assembly + solvers, etc.**  
+
+> In *.nsi.cvg file, column "30. Elapsed CPU time"
+
+2. **For single kernels: element assembly, boundary assembly, subgrid scale assembly, solvers**.  Single kernels. Here, average and maximum times are indicated in *.nsi.cvg at each iteration of each time step:
+
+>     Element assembly: 19. Ass. ave cpu time    20. Ass. max cpu time
+> 
+>     Boundary assembly: 33. Bou. ave cpu time  34. Bou. max cpu time
+> 
+>     Subgrid scale assembly: 31. SGS ave cpu time     32. SGS max cpu time
+> 
+>     Iterative solvers: 21. Sol. ave cpu time     22. Sol. max cpu time
+> 
+> Note that in the case of using Runge-Kutta time integration (the case
+> of the sphere), the element and boundary assembly times are this of
+> the last assembly of current time step (out of three for third order).
+
+3. **Using overall times**. 
+
+> At the end of *.log file, total timings are shown for all modules. In
+> this case we use the first value of the NASTIN MODULE.
+
+Contact
+-------
+
+If you have any question regarding the runs, please feel free to contact Guillaume Houzeaux: guillaume.houzeaux@bsc.es
--- a/alya/README_ACC.md
+++ b/alya/README_ACC.md
-# Alya - Large Scale Computational Mechanics
-
-Alya is a simulation code for high performance computational mechanics. Alya solves coupled multiphysics problems using high performance computing techniques for distributed and shared memory supercomputers, together with vectorization and optimization at the node level.
-
-Homepage: https://www.bsc.es/research-development/research-areas/engineering-simulations/alya-high-performance-computational
-
-Alya is avaialble to collaboratoring projects and a specific version is being distributed as part of the PRACE Unified European Applications Benchmark Suite (http://www.prace-ri.eu/ueabs/#ALYA)
-
-## Building Alya for GPU accelerators
-
-The library currently supports four solvers:GMRES, Deflated Conjugate Gradient, Conjugate Gradient, and Pipelined Conjugate Gradient.
-The only pre-conditioner supported at the moment is 'diagonal'.
-
-Keywords to use the solvers:
-
-```shell
-NINJA GMRES               : GGMR
-NINJA Deflated CG         : GDECG
-NINJA CG                  : GCG
-NINJA Pipelined CG        : GPCG
-
-PRECONDITIONER            : DIAGONAL
-```
-Other options are same a CPU based solver.
-
-### GPGPU Building
-
-This version was tested with the Intel Compilers 2017.1, bullxmpi-1.2.9.1 and NVIDIA CUDA 7.5. Ensure that the wrappers `mpif90` and `mpicc` point to the correct binaries and that `$CUDA_HOME` is set.
-
-Alya can be used with just MPI or hybrid MPI-OpenMP parallelism. Standard execution mode is to rely on MPI only.
-
- - Uncompress the source and configure the depending Metis library and Alya build options:
-
-```shell
-   tar xvf  alya-prace-acc.tar.bz2
-```
-
- -  Edit the file `Alya/Thirdparties/metis-4.0/Makefile.in` to select the compiler and target platform. Uncomment the specific lines and add optimization parameters, e.g.
-
-```shell
-  OPTFLAGS = -O3 -xCORE-AVX2
-```
-
- -  Then build Metis4 
-
-```shell
-  $ cd Alya/Executables/unix
-  $ make metis4
-```
-
- - For Alya there are several example configurations, copy one, e.g. for Intel Compilers:
-
-```shell
-  $ cp configure.in/config_ifort.in config.in
-```
-
- - Edit the config.in:
-  Add the corresponding platform optimization flags to `FCFLAGS`, e.g. 
-
-```shell
-  FCFLAGS  = -module $O -c -xCORE-AVX2
-```
- - MPI: No changes in the configure file are necessary. By default you use metis4 and 4 byte integers.
- - MPI-hybrid (with OpenMP) : Uncomment the following lines for OpenMP version:
-
-```shell
-              CSALYA     := $(CSALYA)   -qopenmp (-fopenmp for GCC Compilers)
-              EXTRALIB   := $(EXTRALIB) -qopenmp (-fopenmp for gcc Compilers)
-```
- - Configure and build Alya (-x Release version; -g Debug version, plus uncommenting debug and checking flags in config.in)
-
-```shell
- ./configure -x nastin parall
- make NINJA=1 -j num_processors
-```
-
-### GPGPU Usage
-
-Each problem needs a `GPUconfig.dat`. A sample is available at `Alya/Thirdparties/ninja` and needs to be copied to the work directory. A README file in the same location provides further information.
-
- - Extract the small one node test case and configure to use GPU solvers:
-
-```shell
- $ tar xvf cavity1_hexa_med.tar.bz2 && cd cavity1_hexa_med
- $ cp ../Alya/Thirdparties/ninja/GPUconfig.dat .
-```
-
- - To use the GPU, you have to replace `GMRES` with `GGMR` and `DEFLATED_CG` with `GDECG`, both in `cavity1_hexa.nsi.dat`
- - Edit the job script to submit the calculation to the batch system. 
-
-```shell
- job.sh: Modify the path where you have your Alya.x (compiled with MPI options)
- sbatch job.sh
-```
- Alternatively execute directly: 
-
-```shell
-OMP_NUM_THREADS=4 mpirun -np 16 Alya.x cavity1_hexa
-```
-
-<!--    Runtime on 16-core Xeon E5-2630 v3 @ 2.40GHz with 2 NVIDIA K80: ~1:30 min -->
-<!--    Runtime on 16-core Xeon E5-2630 v3 @ 2.40GHz no GPU:            ~2:00 min -->
-
-
-## Building Alya for Intel Xeon Phi Knights Landing (KNL)
-
-
-The Xeon Phi processor version of Alya is currently relying on compiler assisted optimization for AVX-512. Porting of performance critical kernels to the new assembly instructions is underway. There will not be a version for first generation Xeon Phi Knights Corner coprocessors.
-
-### KNL Building
-
-
-This version was tested with the Intel Compilers 2017.1, Intel MPI 2017.1. Ensure that the wrappers `mpif90` and `mpicc` point to the correct binaries.
-
-Alya can be used with just MPI or hybrid MPI-OpenMP parallelism. Standard execution mode is to rely on MPI only.
-
- - Uncompress the source and configure the depending Metis library and Alya build options:
-
-```shell
-   tar xvf  alya-prace-acc.tar.bz2
-```
-
- -  Edit the file `Alya/Thirdparties/metis-4.0/Makefile.in` to select the compiler and target platform. Uncomment the specific lines and add optimization parameters, e.g.
-
-```shell
-  OPTFLAGS = -O3 -xMIC-AVX512
-```
-
- -  Then build Metis4
-
-```shell
-  $ cd Alya/Executables/unix
-  $ make metis4
-```
-
- - For Alya there are several example configurations, copy one, e.g. for Intel Compilers:
-
-```shell
-  $ cp configure.in/config_ifort.in config.in
-```
-
- - Edit the config.in:
-  Add the corresponding platform optimization flags to `FCFLAGS`, e.g.
-
-```shell
-  FCFLAGS  = -module $O -c -xMIC-AVX512
-```
-
- - MPI: No changes in the configure file are necessary. By default you use metis4 and 4 byte integers.
- - MPI-hybrid (with OpenMP) : Uncomment the following lines for OpenMP version:
-
-```shell
-              CSALYA     := $(CSALYA)   -qopenmp (-fopenmp for GCC Compilers)
-              EXTRALIB   := $(EXTRALIB) -qopenmp (-fopenmp for gcc Compilers)
-```
- - Configure and build Alya (-x Release version; -g Debug version, plus uncommenting debug and checking flags in config.in)
-
-```shell
- ./configure -x nastin parall
- make -j num_processors
-```
-
-
-## Remarks
-
-
-If the number of elements is too low for a scalability analysis, Alya includes a mesh multiplication technique. This tool can be used by selecting an input option in the ker.dat file. This option is the number of mesh multiplication levels one wants to apply (0 meaning no mesh multiplication). At each multiplication level, the number of elements is multiplied by 8, so one can obtain a huge mesh automatically in order to study the scalability of the code on different architectures. Note that the mesh multiplication is carried out in parallel and thus should not impact the duration of the simulation process.
--- a/code_saturne/CS_collect_timing.sh
+++ b/code_saturne/CS_collect_timing.sh
+#!/bin/bash
+#
+# Read file timer_stats.csv
+#
+#
+export FILE_LENGTH=`wc -l < timer_stats.csv`
+#
+## echo "Number of lines $FILE_LENGTH"
+#
+export TAIL_LINE_NUMBER="$(($FILE_LENGTH-4))"
+#
+## echo $TAIL_LINE_NUMBER
+#
+tail -$TAIL_LINE_NUMBER timer_stats.csv > timer_1st.tmp
+#
+##more timer_1st.tmp
+#
+awk '{print $2}' timer_1st.tmp > timer_2nd.tmp
+#
+sed 's/,//g' timer_2nd.tmp > timer_1st.tmp
+#
+export FILE_LENGTH=`wc -l < timer_1st.tmp`
+#
+## echo "Number of lines $FILE_LENGTH"
+#
+export FILE_LENGTH=$(($FILE_LENGTH-1))
+#
+export HEAD_LINE_NUMBER="-$FILE_LENGTH"
+#
+head $HEAD_LINE_NUMBER timer_1st.tmp > timer_2nd.tmp
+#
+export sum_of_lines=`awk '{s+=$1}END{print s}' timer_2nd.tmp`
+## echo "Sum of the lines of the file: $sum_of_lines"
+#
+##more timer_2nd.tmp
+#
+export average_timing=`echo "$sum_of_lines / $FILE_LENGTH" | bc -l`
+echo "Averaged timing for the $FILE_LENGTH entries: $average_timing"
+#
+rm -rf *.tmp
--- a/code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf
+++ b/code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf
--- a/code_saturne/InstallHPC.sh
+++ b/code_saturne/InstallHPC.sh
+#!/bin/sh
+
+#################################
+## Which version of the code ? ##
+#################################
+
+CODE_VERSION=7.0.0
+KER_VERSION=${CODE_VERSION}
+KERNAME=code_saturne-${KER_VERSION}
+
+################################################
+## Installation PATH in the current directory ##
+################################################
+
+INSTALLPATH=`pwd`
+
+echo $INSTALLPATH
+
+#####################################
+## Environment variables and PATHS ##
+#####################################
+  
+NOM_ARCH=`uname -s`
+
+CS_HOME=${INSTALLPATH}/${KERNAME}
+
+export PATH=$CS_HOME/bin:$PATH
+
+##############
+## Cleaning ##
+##############
+
+rm -rf $CS_HOME/arch/*
+rm -rf $INSTALLPATH/$KERNAME.build
+
+#########################
+## Kernel Installation ##
+#########################
+
+KERSRC=$INSTALLPATH/$KERNAME
+KERBUILD=$INSTALLPATH/$KERNAME.build/arch/$NOM_ARCH
+KEROPT=$INSTALLPATH/$KERNAME/arch/$NOM_ARCH
+
+export KEROPT
+
+mkdir -p $KERBUILD
+cd $KERBUILD
+
+
+$KERSRC/configure          \
+--disable-shared           \
+--disable-nls              \
+--without-modules          \
+--disable-gui              \
+--enable-long-gnum         \
+--disable-mei              \
+--enable-debug             \
+--prefix=$KEROPT           \
+CC="mpicc" CFLAGS="-O3" FC="mpif90" FCFLAGS="-O3" CXX="mpicxx" CXXFLAGS="-O3"
+
+make -j 8
+make install
+
+cd $INSTALLPATH
--- a/code_saturne/README.md
+++ b/code_saturne/README.md
 # Code_Saturne

-Code_Saturne is open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to account for other physics, such as conjugate heat transfer and structure mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi.
+[Code_Saturne](https://www.code-saturne.org/cms/) is an open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A new discretisation based on the Compatible Discrete Operator (CDO) approach can be used for some physics. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to couple other software with different physics, such as for conjugate heat transfer and structural mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the velocity components/scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi.

-The original version of the code is written in C for pre-postprocessing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for most of the physics implementation. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to handle potential shared memory. The version used in this work (also freely available) relies on CUDA to take advantage of potential GPU acceleration.
+The original version of the code is written in C for pre-/post-processing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for some of the physics-related implementation. Python is used to manage the simulations. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to be used on shared memory architectures. The version used in this work relies on external libraries (AMGx - PETSc) to take advantage of potential GPU acceleration.

-The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is usually due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the two test cases ([https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/Code_Saturne_Build_Run_5.3_UEABS.pdf](CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz) and [https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/Code_Saturne_Build_Run_5.3_UEABS.pdf](CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz)) chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, with a mesh 8 times larger for CAVITY_111M than for CAVITY_13M, the time step being halved to ensure a correct Courant number.
+The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the test cases chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, the 3-D lid-driven cavity, using tetrahedral cell meshes. The first case mesh contains over 13 million cells. The larger test cases are modular in the sense that mesh multiplication is used on-the-fly to increase their mesh size, using several level of refinements.

-## Building and running the code is described in the file
-[Code_Saturne_Build_Run_5.3_UEABS.pdf](Code_Saturne_Build_Run_5.3_UEABS.pdf)
+## Building Code_Saturne v7.0.0
+The version 7.0.0 of Code_Saturne is to be found [here](https://www.code-saturne.org/cms/sites/default/files/releases/code_saturne-7.0.0.tar.gz).

-## The test cases are to be found under:
-https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz  
-https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz
+A simple installer [_InstallHPC.sh_](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/InstallHPC.sh) is made available for this version.

-## The distribution is to be found under:
-https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS.tar.gz
+An example of the last lines of the installer (meant for the GNU compiler & MPI-OpenMP in this example) reads:\
+
+$KERSRC/configure  \\         \
+--disable-shared   \\         \
+--disable-nls      \\         \
+--without-modules  \\         \
+--disable-gui      \\         \
+--enable-long-gnum \\         \
+--disable-mei      \\         \
+--enable-debug     \\         \
+--prefix=$KEROPT   \\         \
+CC="mpicc" CFLAGS="-O3" FC="mpif90" FCFLAGS="-O3" CXX="mpicxx" CXXFLAGS="-O3" \
+\# \
+make -j 8 \
+make install
+
+CC, FC, CFLAGS, FCFLAGS, LDFLAGS and LIBS might have to be tailored for your machine, compilers, MPI installation, etc.
+More information concerning the options can be found by typing: ./configure --help
+
+Assuming that CS_7.0.0_PRACE_UEABS is the current directory, the tarball is untarred in there as: \
+tar zxvf code_saturne-7.0.0.tar.gz
+
+and the code is then installed as:
+
+cd CS_7.0.0_PRACE_UEABS \
+./InstallHPC.sh
+
+If the installation is successful the command **code_saturne** should return, when typing:\
+YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne
+
+Usage: ./code_saturne <topic>
+
+Topics: \
+  help \
+  studymanager \
+  smgr \
+  bdiff \
+  bdump \
+  compile \
+  config \
+  cplgui \
+  create \
+  gui \
+  parametric \
+  studymanagergui \
+  smgrgui \
+  trackcvg \
+  update \
+  up \
+  info \
+  run \
+  submit \
+  symbol2line
+
+Options: \
+  -h, --help  show this help message and exit
+
+## Preparing a simulation.
+Two archives are used, namely [**CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz**](https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz) and [**CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz**](https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz) that contain the information required to run both test cases, with the mesh_input.csm file (for the mesh) and the usersubroutines in _src_.
+
+Taking the example of CAVITY_13M, from the working directory WORKDIR (different from CS_7.0.0_PRACE_UEABS), a ‘study’ has to be created (CAVITY_13M, for instance) as well as a ‘case’ (MACHINE, for instance) as:
+
+YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne create --study CAVITY_13M --case MACHINE --copy-ref
+
+The directory **CAVITY_13M** contains 3 directories, MACHINE, MESH and POST.
+
+The directory **MACHINE** contains 3 directories, DATA, RESU and SRC.
+
+The file mesh_input.csm should be copied into the MESH directory.
+
+The user subroutines (cs_user* files) contained in _src_ should be copied into SRC.
+
+The file _cs_user_scripts.py_ is used to manage the simulation. It has to be copied to DATA as: \
+cd DATA \
+cp REFERENCE/cs_user_scripts.py . \
+At Line 89 of this file, you need to change from None to the local path of the mesh, i.e. "../MESH/mesh_input.csm”
+
+To finalise the preparation go to the folder MACHINE and type: \
+YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne run --initialize
+
+This should create a folder RESU/YYYYMMDD-HHMM, which should contain the following flles:
+- compile.log
+- cs_solver
+- cs_user_scripts.py
+- listing
+- mesh_input.csm
+- run.cfg
+- run_solver
+- setup.xml
+- src
+- summary
+
+## Running Code_Saturne v7.0.0
+The name of the executable is ./cs_solver and, the code should be run as mpirun/mpiexec/poe/aprun/srun ./cs_solver
+
+## Example of timing
+A script is used to compute the average time per time step, e.g. [_CS_collect_timing.sh_](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/CS_collect_timing.sh), which returns:
+
+Averaged timing for the 97 entries: 2.82014432989690721649
+
+for the case of the CAVITY_13M, run on 2 nodes of a Cray - AMD (Rome).
+
+## Larger cases
+The same steps are carried for the larger cases using the CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz file.
+These cases are built by mesh multiplication (also called global refinement) of the mesh used for CAVITY_13M.
+If 1 (resp. 2 or 3) level(s) of refinement is/are used, the mesh is over 111M (resp. 889M or 7112M) cells large. The
+third mesh (level 3) is definitely suitable to run using over 100,000 MPI tasks.
+
+To make sure that the simulations are stable, the time step is adjusted depending on the refinement level used.
+
+The number of levels of refinement is set at Line 152 of the _cs_user_mesh.c_ file, by chosing tot_nb_mm as
+1, 2 or 3.\
+The time step is set at Line 248 of the _cs_user_parameter.f90_ file, by chosing 0.01d0 / 3.d0 (level 1), 0.01d0 / 9.d0
+(level 2) or 0.01d0 / 27.d0. \
+The table below recalls the correct settings.
+
+|  | At Line 152 of _cs_user_mesh.c_ | At Line 248 of _cs_user_parameter.f90_ |
+| ------ | ------ | ------ |
+| Level 1 | tot_nb_mm = 1 | dtref = 0.01d0 / 3.d0 |
+| Level 2 | tot_nb_mm = 2 | dtref = 0.01d0 / 9.d0 |
+| Level 3 | tot_nb_mm = 3 | dtref = 0.01d0 / 27.d0 |
--- a/cp2k/ARCH-files/Hawk.psmp
+++ b/cp2k/ARCH-files/Hawk.psmp
+CC       = mpicc -fopenmp
+FC       = mpif90 -fopenmp
+LD       = mpif90 -fopenmp
+AR       = ar -r
+
+DATA_DIR   = /zhome/academic/HLRS/pri/iprhjud/CP2K/cp2k-8.1/data
+CP2K_ROOT  = /zhome/academic/HLRS/pri/iprhjud/CP2K
+
+
+MKL_LIB = ${MKLROOT}/lib/intel64
+
+
+# Options
+
+DFLAGS   = -D__FFTW3 -D__LIBXC  -D__MKL \
+           -D__LIBINT -D__MAX_CONTR=4 -D__ELPA=202005  \
+           -D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
+           -D__STATM_RESIDENT
+
+CFLAGS   = -O3 -mavx -funroll-loops -ftree-vectorize \
+           -ffree-form -march=znver2 -mtune=znver2 -fno-math-errno 
+
+FCFLAGS  = $(DFLAGS) $(CFLAGS) \
+           -I$(CP2K_ROOT)/libs/libint/include  \
+           -I$(CP2K_ROOT)/libs/libxc/include   \
+           -I$(MKLROOT)/include \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
+
+
+LDFLAGS  = $(FCFLAGS)
+
+LIBS     = -L$(CP2K_ROOT)/libs/libint/lib -lint2  \
+           -L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
+           -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
+           -lfftw3 -lfftw3_threads -lz \
+           $(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
+           $(MKL_LIB)/libmkl_gf_lp64.a $(MKL_LIB)/libmkl_sequential.a \
+           $(MKL_LIB)/libmkl_core.a \
+           $(MKL_LIB)/libmkl_blacs_sgimpt_lp64.a -Wl,--end-group \
+           -ldl -lpthread -lm -lstdc++
+
--- a/cp2k/ARCH-files/Irene.psmp
+++ b/cp2k/ARCH-files/Irene.psmp
+# Irene ARCH file
+# module load feature/openmpi/mpi_compiler/gcc
+# module load  flavor/openmpi
+# module load gnu/8.3.0
+# module load  mkl
+
+CC       = mpicc
+FC       = mpif90 -fopenmp
+LD       = mpif90 -fopenmp
+AR       = ar -r
+
+DATA_DIR   = /ccc/work/cont005/pa5489/judgehol/CP2K/cp2k-8.1/data
+CP2K_ROOT  = /ccc/work/cont005/pa5489/judgehol/CP2K
+
+
+MKL_LIB = ${MKLROOT}/lib/intel64
+
+# Options
+
+DFLAGS   = -D__FFTW3 -D__MKL -D__LIBXSMM \
+           -D__LIBINT -D__MAX_CONTR=4  -D__LIBXC \
+           -D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
+           -D__STATM_RESIDENT
+
+CFLAGS   = -O2 -g -funroll-loops -ftree-vectorize -std=f2008 \
+           -ffree-form -mtune=native -fno-math-errno -ffree-line-length-none
+
+FCFLAGS  = $(DFLAGS) $(CFLAGS) \
+           -I$(CP2K_ROOT)/libs/libint/include \
+           -I$(MKLROOT)/include -m64 \
+           -I$(CP2K_ROOT)/libs/libxsmm/include \
+           -I$(CP2K_ROOT)/libs/libxc/include \
+           -I$(CP2K_ROOT)/libs/fftw/include 
+
+LDFLAGS  = $(FCFLAGS)
+
+LIBS     = -L$(CP2K_ROOT)/libs/libint/lib -lint2  \
+           -L$(CP2K_ROOT)/libs/libxsmm/lib -lxsmmf -lxsmm -lxsmmext \
+           -L$(CP2K_ROOT)/libs/fftw/lib -lfftw3 -lfftw3_threads -lz -ldl -lstdc++ \
+           $(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
+           $(MKL_LIB)/libmkl_gf_lp64.a ${MKL_LIB}/libmkl_sequential.a \
+           $(MKL_LIB)/libmkl_core.a \
+           ${MKL_LIB}/libmkl_blacs_openmpi_lp64.a -Wl,--end-group \
+           -lpthread -lm
+
--- a/cp2k/ARCH-files/Juwels.psmp
+++ b/cp2k/ARCH-files/Juwels.psmp
+# CP2K arch file for Juwels psmp
+# module load GCC, ParastationMPI/5.2.2-1 FFTW/3.3.8 imkl/2019.5.281
+
+CC       = mpicc
+FC       = mpif90 -fopenmp
+LD       = mpif90 -fopenmp
+AR       = ar -r
+
+DATA_DIR   = /p/project/prpb92/CP2K/cp2k-8.1/data
+CP2K_ROOT  = /p/project/prpb92/CP2K
+
+MKL_LIB = ${MKLROOT}/lib/intel64
+
+DFLAGS   = -D__FFTW3 -D__MKL -D__ELPA=202005 \
+           -D__LIBINT -D__MAX_CONTR=4  -D__LIBXC -D__LIBXSMM \
+           -D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
+           -D__STATM_RESIDENT
+
+CFLAGS   = -O3 -mavx -funroll-loops -ftree-vectorize \
+           -ffree-form -mtune=native -fno-math-errno 
+
+FCFLAGS  = $(DFLAGS) $(CFLAGS) \
+           -I$(CP2K_ROOT)/libs/libint/include \
+           -I$(MKLROOT)/include -m64 \
+           -I$(CP2K_ROOT)/libs/libxsmm/include \
+           -I$(CP2K_ROOT)/libs/libxc/include   \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
+
+LDFLAGS  = $(FCFLAGS)
+
+LIBS     = -L$(CP2K_ROOT)/libs/libint/lib -lint2  \
+           -L$(CP2K_ROOT)/libs/libxsmm/lib -lxsmmf -lxsmm -lxsmmext \
+           -L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
+           -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
+           $(PLUMED_DEPENDENCIES) -lfftw3 -lfftw3_threads -lz -ldl -lstdc++ \
+           $(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
+           $(MKL_LIB)/libmkl_gf_lp64.a ${MKL_LIB}/libmkl_sequential.a \
+           $(MKL_LIB)/libmkl_core.a \
+           ${MKL_LIB}/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group \
+           -lpthread -lm
+
--- a/cp2k/ARCH-files/Marconi100.psmp
+++ b/cp2k/ARCH-files/Marconi100.psmp
+NVCC     = ${CUDA_PATH}/bin/nvcc
+CC       = gcc
+CXX      = g++
+FC       = mpif90
+LD       = mpif90
+AR       = ar -r
+GPUVER   = V100
+
+CUDAPATH  = /cineca/prod/opt/compilers/cuda/11.0/none
+
+CXXFLAGS = -O3 -I$(CUDAPATH)/include -std=c++11 -fopenmp
+DATA_DIR   = /m100_work/Ppp4x_5489/CP2K/cp2k-8.1/data
+CP2K_ROOT  = /m100_work/Ppp4x_5489/CP2K
+
+
+
+LIBINT_INC  = $(CP2K_ROOT)/libs/libint/include
+LIBINT_LIB  = $(CP2K_ROOT)/libs/libint/lib
+LIBXC_INC   = $(CP2K_ROOT)/libs/libxc/include
+LIBXC_LIB   = $(CP2K_ROOT)/libs/libxc/lib
+
+
+
+
+
+DFLAGS   = -D__FFTW3 -D__ACC -D__DBCSR_ACC  -D__SCALAPACK  -D__PW_CUDA -D__parallel -D__LIBINT -D__MPI_VERSION=3 -D__LIBXC -D__GFORTRAN
+
+FCFLAGS  = -fopenmp  -std=f2008 -fimplicit-none -ffree-form  -fno-omit-frame-pointer -O3 -ftree-vectorize $(DFLAGS) $(WFLAGS)
+
+FCFLAGS    += -I$(LIBINT_INC) -I$(LIBXC_INC)
+
+LDFLAGS  =  -L$(CUDAPATH)/lib64 $(FCFLAGS)
+NVFLAGS  = $(DFLAGS)  -O3 -arch sm_70 -Xcompiler='-fopenmp' --std=c++11
+CFLAGS   = $(DFLAGS) -I$(LAPACK_INC) -I${FFTW_INC}  -fno-omit-frame-pointer -g -O3 -fopenmp
+LIBS     =  -L${LAPACK_LIB} -L${BLAS_LIB} -L${FFTW_LIB} -L${CUDA_LIB} -L${SCALAPACK_LIB} -lscalapack   -llapack -lblas  -lstdc++ -lfftw3 -lfftw3_omp -lcuda -lcudart -lnvrtc -lcufft -lcublas -lrt
+
+LIBS       += $(LIBINT_LIB)/libint2.a
+LIBS       += $(LIBXC_LIB)/libxcf03.a $(LIBXC_LIB)/libxc.a
+
--- a/cp2k/ARCH-files/MareNostrum.psmp
+++ b/cp2k/ARCH-files/MareNostrum.psmp
+# CP2K arch file for Marenostrum  psmp
+# module unload intel impi
+# module load gnu/8.4.0
+# module load openmpi/4.0.2
+# module load mkl/2018.4
+
+
+
+CC       = mpicc -fopenmp
+FC       = mpif90 -fopenmp
+LD       = mpif90 -fopenmp
+AR       = ar -r
+
+DATA_DIR   = /gpfs/scratch/pr1emd00/pr1emd01/CP2K/cp2k-8.1/data
+CP2K_ROOT  = /gpfs/scratch/pr1emd00/pr1emd01/CP2K
+
+
+MKL_LIB = ${MKLROOT}/lib/intel64
+FFTW_LIB = /gpfs/scratch/pr1emd00/pr1emd01/CP2K/libs/fftw
+
+# Options
+
+DFLAGS   = -D__FFTW3 -D__LIBXC  -D__MKL \
+           -D__LIBINT -D__MAX_CONTR=4 -D__ELPA=202005  \
+           -D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
+           -D__STATM_RESIDENT
+
+CFLAGS   = -O3 -mavx -funroll-loops -ftree-vectorize \
+           -ffree-form -march=skylake-avx512 -fno-math-errno 
+
+FCFLAGS  = $(DFLAGS) $(CFLAGS) \
+           -I$(CP2K_ROOT)/libs/libint/include  \
+           -I$(CP2K_ROOT)/libs/libxc/include   \
+           -I$(MKLROOT)/include \
+           -I$(FFTW_LIB)/include \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
+
+
+LDFLAGS  = $(FCFLAGS)
+
+LIBS     = -L$(CP2K_ROOT)/libs/libint/lib -lint2  \
+           -L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
+           -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
+           $(FFTW_LIB)/lib/libfftw3.a $(FFTW_LIB)/lib/libfftw3_threads.a -lz \
+           $(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
+           $(MKL_LIB)/libmkl_gf_lp64.a $(MKL_LIB)/libmkl_sequential.a \
+           $(MKL_LIB)/libmkl_core.a \
+           $(MKL_LIB)/libmkl_blacs_openmpi_lp64.a -Wl,--end-group \
+           -ldl -lpthread -lm -lstdc++
--- a/cp2k/ARCH-files/PizDaint-CPU.psmp
+++ b/cp2k/ARCH-files/PizDaint-CPU.psmp
+# modules: CrayGNU cray-fftw cray-python 
+CC       = cc
+CPP      = 
+FC       = ftn
+LD       = ftn
+AR       = ar -r
+
+CP2K_ROOT=/scratch/snx3000/hjudge/CP2K/build-cpu
+
+DFLAGS   = -D__FFTW3 -D__parallel -D__SCALAPACK -D__LIBINT -D__GFORTRAN -D__ELPA -D__LIBXC
+CFLAGS   = $(DFLAGS) -g -O3 -mavx -fopenmp -march=native -mtune=native
+CXXFLAGS = $(CFLAGS)
+
+FCFLAGS  = $(DFLAGS) -O3 -mavx -fopenmp -funroll-loops -ftree-vectorize -ffree-form -ffree-line-length-512 -march=native -mtune=native
+FCFLAGS  += -I$(CP2K_ROOT)/libs/libint/include 
+FCFLAGS  += -I$(CP2K_ROOT)/libs/libxc/include
+FCFLAGS  += -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
+LDFLAGS  = $(FCFLAGS)
+LIBS   	 = -lfftw3 -lfftw3_threads
+LIBS     += -L$(CP2K_ROOT)/libs/libint/lib -lint2 -lstdc++ 
+LIBS     += -L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc
+LIBS     += -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp
+
--- a/cp2k/ARCH-files/PizDaint-GPU.psmp
+++ b/cp2k/ARCH-files/PizDaint-GPU.psmp
+# modules: CrayGNU cray-fftw cray-python cudatoolkit 
+GPUVER   = P100
+NVCC     = nvcc
+CC       = cc
+CPP      = 
+FC       = ftn
+LD       = ftn
+AR       = ar -r
+
+CP2K_ROOT=/scratch/snx3000/hjudge/CP2K/build
+
+DFLAGS   = -D__FFTW3 -D__parallel -D__SCALAPACK -D__ACC -D__DBCSR_ACC -D__LIBINT -D__GFORTRAN -D__HAS_smm_dnn -D__LIBXC -D__ELPA
+CFLAGS   = $(DFLAGS)  -I$(CRAY_CUDATOOLKIT_DIR)/include -g -O3 -mavx -fopenmp
+CXXFLAGS = $(CFLAGS)
+FCFLAGS  = $(DFLAGS) -O3 -mavx -fopenmp -funroll-loops -ftree-vectorize -ffree-form -ffree-line-length-512
+FCFLAGS  += -I$(CP2K_ROOT)/libs/libint/include 
+FCFLAGS  += -I$(CP2K_ROOT)/libs/libxc/include -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
+LDFLAGS  = $(FCFLAGS)
+NVFLAGS  = $(DFLAGS) -O3 -arch sm_60
+LIBS   	 = -lfftw3 -lfftw3_threads -lcudart -lcublas -lcufft -lrt -lnvrtc
+LIBS     += -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp
+LIBS     += -L$(CP2K_ROOT)/libs/libint/lib -lint2 -lstdc++ 
+LIBS     += -L$(CP2K_ROOT)/libs/libxc/lib -lxcf03 -lxc
+LIBS     += /apps/common/UES/easybuild/sources/c/CP2K/libsmm_dnn_cray.gnu.a
--- a/cp2k/ARCH-files/SuperMUC.psmp
+++ b/cp2k/ARCH-files/SuperMUC.psmp
+# SuperMUC-NG arch file
+# module swap devEnv/Intel/2019 devEnv/GCC/8-IntelMPI 
+# module swap mpi.intel openmpi/4.0.2 
+# module load mkl/2019_gcc 
+
+
+CC       = mpicc -fopenmp
+FC       = mpif90 -fopenmp
+LD       = mpif90 -fopenmp
+AR       = ar -r
+
+
+DATA_DIR  = /hppfs/work/pn68ho/di67kis/CP2K/cp2k-8.1/data
+CP2K_ROOT = /hppfs/work/pn68ho/di67kis/CP2K
+
+MKL_LIB = ${MKLROOT}/lib/intel64
+
+# Options
+
+DFLAGS   = -D__FFTW3 -D__MKL -D__LIBXC \
+           -D__LIBINT -D__LIBXSMM -D__ELPA=202005 -D__MAX_CONTR=4  \
+           -D__parallel -D__SCALAPACK -D__MPI_VERSION=3 \
+           -D__STATM_RESIDENT
+
+CFLAGS   = -O3 -mavx -funroll-loops -ftree-vectorize \
+           -ffree-form -march=native -fno-math-errno \
+           -I$(CP2K_ROOT)/libs/libxsmm/include
+
+FCFLAGS  = $(DFLAGS) $(CFLAGS) \
+           -I$(CP2K_ROOT)/libs/libint/include \
+           -I$(CP2K_ROOT)/libs/libxc/include \
+           -I$(MKLROOT)/include -m64 \
+           -I$(CP2K_ROOT)/libs/libxsmm/include \
+           -I$(CP2K_ROOT)/libs/fftw/include \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/modules \
+           -I$(CP2K_ROOT)/libs/elpa-openmp/include/elpa_openmp-2020.05.001/elpa
+
+LDFLAGS  = $(FCFLAGS)
+
+LIBS     = -L$(CP2K_ROOT)/libs/libint/lib -lint2  \
+           -L$(CP2K_ROOT)/libs/libxc/lib -lxcf90 -lxcf03 -lxc \
+           -L$(CP2K_ROOT)/libs/libxsmm/lib -lxsmmf -lxsmm -lxsmmext \
+           -L$(CP2K_ROOT)/libs/elpa-openmp/lib -lelpa_openmp \
+           -L$(CP2K_ROOT)/libs/fftw/lib -lfftw3 -lfftw3_threads -lz -ldl -lstdc++ \
+           $(MKL_LIB)/libmkl_scalapack_lp64.a -Wl,--start-group \
+           $(MKL_LIB)/libmkl_gf_lp64.a $(MKL_LIB)/libmkl_sequential.a \
+           $(MKL_LIB)/libmkl_core.a \
+           $(MKL_LIB)/libmkl_blacs_openmpi_lp64.a -Wl,--end-group \
+           -lpthread -lm
No results found