README.md

# GADGET


## Summary Version
4.0 (2021)

## Purpose of Benchmark
Provide the Astrophysical community information on the performance and scalability (weak and strong scaling) of the Gadget-4 code associated to three test cases in PRACE Tier-0 supercomputers (JUWELS, MareNostrum4, IRENE-SKL, and IRENE-KNL).

## Characteristics of Benchmark

GADGET-4 was compiled with C++ with the optimisation level O3, MPI (e.g., OpenMPI, Intel MPI) and the libraries HDF5, GSL, and FFTW3. The tests were carried out using two modes: Intel and GCC compiled MPI API and libraries. In order to have a proper scalability analysis the tests we carried out with one MPI-task per core and 16 tasks per CPU. Hence, a total of 32 tasks (cores) dedicated to the calculations were used per compute node. An extra core to handle the MPI communications was used per compute node.

## Mechanics of Building Benchmark

Building the GADGET code requires a compiler with full C++11 support, MPI (e.g., MPICH, OpenMPI, IntelMPI), HDF5, GSL, and FFTW3. Hence, the corresponding environment modules must be loaded, e.g.,

```
module load OpenMPI/4.0.3 HDF5/1.10.6 FFTW/3.3.8 GSL/2.6
```

### Source Code and Initial Conditions

##### Source Code Release

Latest release of the code can be downloaded from [https://gitlab.mpcdf.mpg.de/vrs/gadget4](https://gitlab.mpcdf.mpg.de/vrs/gadget4)

or clone the repository by

```
git clone http://gitlab.mpcdf.mpg.de/vrs/gadget4
```

##### In this UEABS repository you can find:

- A cloned version of the code (version of June 28, 2021): [gadget4.tar.gz](./gadget/4.0/gadget4.tar.gz)

This tarball includes the `src` code, `examples`, `buildsystem`, and `documentation` folders. It also includes **Makefile** and **Makefile.systype** (or a template) files.

- The code used in the benchmarks (version of June 22, 2021):[gadget4-benchmarks.tar.gz](./gadget/4.0/gadget4-benchmarks.tar.gz)

- Examples initial conditions from [example_ics.tar.gz](./gadget/4.0/example_ics.tar.gz)

It includes initial conditions for each of the examples. When untarred you generate a folder named `ExampleICs`.

### Build the Executable

#### General Building of the Executable

1. Two files are need from the repository: [gadget4.tar.gz](./gadget/4.0/gadget4.tar.gz) and [example_ics.tar.gz](./gadget/4.0/example_ics.tar.gz)

2. After decompressing gadget4.tar.gz go to the master folder named `gadget4`. There are two files that need modification: **Makefile.systype** and **Makefile**.

a) In the **Makefile.systype** select one of the system types by uncommenting the corresponding line or add a line with your system, e.g.,
```
#SYSTYPE="XXX-BBB"
```
where XXX = system name and BBB = whatever you may want to include here, e.g., impi, openmpi, etc.

b) In case you uncommented a line corresponding to your system in the **Makefile.systype** then there is nothing to do in the **Makefile**.

c) In case you added a line, say #SYSTYPE="XXX-BBB", into the **Makefile.systype** then you must modify the **Makefile** by adding the following lines in 'define available Systems'

```
ifeq ($(SYSTYPE),"XXX-BBB")
include buildsystem/Makefile.comp.XXX-BBB
include buildsystem/Makefile.path.XXX-BBB
endif
```

3. In the folder `buildsystem` make sure you have the **Makefile.comp.XXX** and **Makefile.path.XXX** (XXX = cluster name) set with the proper paths and compilation options, respectively. Either chose the existing files or create new ones that reflect your system paths and compiler.

4. The folder `examples` has several subfolders of test cases. From one of these subfolders, e.g., `CollidingGalaxiesSFR`, copy **Config.sh** to the master folder.

5. In the master folder compile the code
```
make CONFIG=Config.sh EXEC=gadget4-exe
```
where EXEC is the name of the executable.

6. Create a folder named `Run_CollidingGalaxies`. Copy **gadget4-exe**, and the files **param.txt** and **TREECOOL** existing in the subfolder `CollidingGalaxiesSFR` to `Run_CollidingGalaxies`.

7. In the folder `Run_CollidingGalaxies` modify **param.txt** to include the adequate path to the initial conditions file **ics_collision_g4.dat** located in the folder `ExampleICs` and modify the memory per core to that of the system you are using.

8. Run the code using mpirun or submit a SLURM script.


#### Building a Test Case Executable | Case A

1. Download and untar a test case tarball, e.g., [gadget4-case-A.tar.gz](./gadget/4.0/gadget4-case-A.tar.gz) (see below) and the source code used in the benchmarks named [gadget4-benchmarks.tar.gz](./gadget/4.0/gadget4-benchmarks.tar.gz). The folder `gadget4-case-A` has the **Config.sh**, **ics_collision_g4.dat**, **param.txt**, **TREECOOL**, **slurm_script.sh**, and **README** files. 

The **param.txt** file has the path for the initial conditions and was adapted for a system with 2.0 GB RAM per core, in effect 1.8 GB.

The **README** file describes the setup and run of the code in a supercomputer. Use this as an example of what to expect in other machines.

2. Change to the folder named `gadget4-benchmarks` and adapt the file **Makefile.systype** and **Makefile** to your needs. Follow instructions 2a), 2b) or 2c) in Section "General Building of the Executable".

3. Compile the code using the **Config.sh** file in `gadget4-case-A`

```
make CONFIG=../gadget4-case-A/Config.sh EXEC=../gadget4-case-A/gadget4-exe
```

4. Change to folder `gadget4-case-A` and make sure that the file **param.txt** has the correct memory size per core for the system you are using. 

5. Run the code directly with mpirun or submit a SLURM script.

 
### Mechanics of Running Benchmark

The general way to run the benchmarks, assuming SLURM Resource/Batch Manager is:

1. Set the environment modules (see Build the Executable section)

2. In the folder of the test cases, e.g., `gadget4-case-A`, adapt the SLURM script and submit it

```
sbatch slurm_script.sh
```
where the slurm_script.sh has the form (for a run with 1024 cores):

```
#!/bin/bash -l 
#SBATCH --time=01:00:00
#SBATCH --account=ACCOUNT
#SBATCH --job-name=collgal-01024
#SBATCH --output=g_collgal_%j.out
#SBATCH --error=g_collgal_%j.error
#SBATCH --nodes=32
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-socket=17
#SBATCH --ntasks-per-node=33
#SBATCH --exclusive
#SBATCH --partition=batch

echo
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Current working directory is `pwd`"
echo

srun ./gadget4-exe param.txt
```
Where:
* gadget4-exe is the executable.
* param.txt is the input parameter file. 

##### NOTE 

Gadget-4 uses one core per compute node to handle communications. Hence, when allocating compute 
nodes we must take into account an extra core. So, if we want to run the code with 16 mpi tasks/socket we 
must allocate 33 cores per compute node. For a run with 1024 cores in 32 nodes we allocate 1056 cores. 

##### OUTPUT of a run with 1024 cores

```
Running on hosts: jwc03n[082-097,169-184]
Running on 32 nodes.
Running on 1056 processors.
Current working directory is Test-Case-A

Shared memory islands host a minimum of 33 and a maximum of 33 MPI ranks.
We shall use 32 MPI ranks in total for assisting one-sided communication (1 per shared memory node).

  ___    __    ____    ___  ____  ____       __
 / __)  /__\  (  _ \  / __)( ___)(_  _)___  /. |
( (_-. /(__)\  )(_) )( (_-. )__)   )( (___)(_  _)
 \___/(__)(__)(____/  \___/(____) (__)       (_)

This is Gadget, version 4.0.
Git commit 8ee7f358cf43a37955018f64404db191798a32a3, Tue Jun 15 15:10:36 2021 +0200

Code was compiled with the following compiler and flags:
...

Code was compiled with the following settings:
    COOLING
    DOUBLEPRECISION=1
    GADGET2_HEADER
    MULTIPOLE_ORDER=3
    NSOFTCLASSES=2
    NTYPES=6
    POSITIONS_IN_64BIT
    SELFGRAVITY
    STARFORMATION
    TREE_NUM_BEFORE_NODESPLIT=4


Running on 1024 MPI tasks.
```

### UEABS Benchmarks

**A) `Colliding galaxies with star formation`**

This simulation with setup in the folder CollidingGalaxiesSFR considers the collision of two compound galaxies made up of a dark matter halo, a stellar disk and bulge, and cold gas in the disk that undergoes star formation. Radiative cooling due to helium and hydrogen is included. Star formation and feedback is modelled with a simple subgrid treatment. 

[Download test Case A](./gadget/4.0/gadget4-caseA.tar.gz)


**B) `Cosmological DM-only simulation with IC creation`**

The setup in DM-L50-N128 simulates a small box of comoving side-length 50 Mpc/h using 128^3 dark matter particles. The initial conditions are created on the fly upon start-up of the code, using second order Lagrangian perturbation theory with a starting redshift of z=63. The LEAN option and 32-bit arithmetic are enabled to minimize memory consumption of the code.

Gravity is computed with the TreePM algorithm at expansion order p=3. Three output times are defined, for which FOF group finding is enabled, and power spectra are computed as well for the snapshots that are produced. Also, the code is asked to compute a power spectrum for each output.

[Download test Case B](./gadget/4.0/gadget4-caseB.tar.gz)


**C) `Adiabatic collapse of a gas sphere`**

This simulation in G2-gassphere considers the gravitational collapse of a self-gravitating sphere of gas which initially has a 1/r density profile and a very low temperature. The gas falls under its own weight to the centre, where it bounces back and a strong shock wave that moves outwards develops. The simulation uses Newtonian physics in a natural system of units (G=1).

[Download test Case C](./gadget/4.0/gadget4-caseC.tar.gz)

   
## Performance 
GADGET reports in log file both time and performance. 

** `Performance` in `ns/day` units : `grep Performance logfile | awk -F ' ' '{print $2}'`.  **

** `Execution Time` in `seconds` : `grep Time: logfile | awk -F ' ' '{print $3}'`**