Update gadget/4.0/README.md

865abf85 · Miguel Avillez · b55867b0 · 865abf85
Commit 865abf85 authored 2 years ago by Miguel Avillez
--- a/gadget/4.0/README.md
+++ b/gadget/4.0/README.md
@@ -5,15 +5,22 @@
 4.0 (2021)

 ## Purpose of Benchmark
-Provide the Astrophysical community information on the performance and scalability (weak and strong scaling) of the Gadget-4 code associated to three test cases in PRACE Tier-0 supercomputers (JUWELS, MareNostrum4, IRENE-SKL, and IRENE-KNL).
+Provide the Astrophysical community information on the performance and scalability (weak and strong scaling) of the Gadget-4 code associated to three test cases in PRACE Tier-0 supercomputers (JUWELS, MareNostrum4, and IRENE-SKL).

 ## Characteristics of Benchmark

-GADGET-4 was compiled with C++ with the optimisation level O3, MPI (e.g., OpenMPI, Intel MPI) and the libraries HDF5, GSL, and FFTW3. The tests were carried out using two modes: Intel and GCC compiled MPI API and libraries. In order to have a proper scalability analysis the tests we carried out with one MPI-task per core and 16 tasks per CPU. Hence, a total of 32 tasks (cores) dedicated to the calculations were used per compute node. An extra core to handle the MPI communications was used per compute node.
+GADGET-4 was compiled with C++ with the optimisation level O3, MPI (e.g., OpenMPI, Intel MPI) and the libraries HDF5, GSL, and FFTW3. The tests were carried out using two modes: Intel and GCC compiled MPI API and libraries.
+
+In order to study the scalability of the software two approaches were considered:
+1. A core-based performance analysis where 1 MPI task per core, 16 cores per socket, that is 16 MPI tasks per socket, and 1 extra core per compute node to handle communications when multiple compute nodes were used. For the runs on a single node (that is with the number of cores varying between 1 and 32) no extra core was considered.
+
+2. A node-based performance analysis where 1 MPI task per core, and all cores in the socket, that is 24 MPI tasks per socket, including an extra core for MPI communications when multiple nodes are used. For runs on a single node there is no need to use an extra core for communications.
+
+In both setups the compute nodes were used with exclusivity. These approaches allow us to identify which setup provides the better performance for the GADGET-4 code.

 ## Mechanics of Building Benchmark

-Building the GADGET code requires a compiler with full C++11 support, MPI (e.g., MPICH, OpenMPI, IntelMPI), HDF5, GSL, and FFTW3. Hence, the corresponding environment modules must be loaded, e.g.,
+Building the GADGET code requires a compiler with full C++11 support, MPI (e.g., MPICH, OpenMPI, Intel MPI), HDF5, GSL, and FFTW3. Hence, the corresponding environment modules must be loaded, e.g.,

 ```
 module load OpenMPI/4.0.3 HDF5/1.10.6 FFTW/3.3.8 GSL/2.6
@@ -192,25 +199,18 @@ Running on 1024 MPI tasks.

 ### UEABS Benchmarks

-**A) `Colliding galaxies with star formation`**
-
-This simulation with setup in the folder CollidingGalaxiesSFR considers the collision of two compound galaxies made up of a dark matter halo, a stellar disk and bulge, and cold gas in the disk that undergoes star formation. Radiative cooling due to helium and hydrogen is included. Star formation and feedback is modelled with a simple subgrid treatment. 
-
-[Download test Case A](./gadget/4.0/gadget4-caseA.tar.gz)
-
-
-**B) `Cosmological DM-only simulation with IC creation`**
+**B) `Cosmological Dark Matter-only Simulation`**

-The setup in DM-L50-N128 simulates a small box of comoving side-length 50 Mpc/h using 128^3 dark matter particles. The initial conditions are created on the fly upon start-up of the code, using second order Lagrangian perturbation theory with a starting redshift of z=63. The LEAN option and 32-bit arithmetic are enabled to minimize memory consumption of the code.
+This test case involves the three-dimensional simulation of the structure formation in the universe in a small box of linear length (in each direction) of 50 Mpc/h (pc denotes a parsec = 3.086×10^{16} m; Mpc = 10^6 pc; h denotes the Hubble constant) using 512^3 dark matter particles. The initial conditions are created on the fly after start-up of the simulation at redshift Z=63. The simulation evolves until redshift Z = 50. In order to minimise memory consumption 32-bit arithmetic is used.

 Gravity is computed with the TreePM algorithm at expansion order p=3. Three output times are defined, for which FOF group finding is enabled, and power spectra are computed as well for the snapshots that are produced. Also, the code is asked to compute a power spectrum for each output.

 [Download test Case B](./gadget/4.0/gadget4-caseB.tar.gz)

+**C) `Blob Test`**

-**C) `Adiabatic collapse of a gas sphere`**
+The blob test consists in the simulation of a spherical cloud (blob) that is placed in a wind tunnel in pressure equilibrium with the surrounding medium. The cloud has a temperature and a density 10 times lower and higher, respectively, than the surrounding medium. This test allows for the development of hydrodynamical instabilities at the cloud surface, e.g. Kelvin-Helmholtz and Rayleigh-Taylor, leading to the cloud breakup with time. The cloud is setup with 1 million smooth particle hydrodynamics (SPH) particles. A more sizeable test is done with 10 million particles.

-This simulation in G2-gassphere considers the gravitational collapse of a self-gravitating sphere of gas which initially has a 1/r density profile and a very low temperature. The gas falls under its own weight to the centre, where it bounces back and a strong shock wave that moves outwards develops. The simulation uses Newtonian physics in a natural system of units (G=1).

 [Download test Case C](./gadget/4.0/gadget4-caseC.tar.gz)