Compilation and submission description for, e.g., JUWELS supercomputer


1. After logging in JUWELS load the following modules:

module load Intel/2020.2.254-GCC-9.3.0 IntelMPI/2019.8.254
module load FFTW/3.3.8 HDF5/1.10.6 GSL/2.6

2. In the folder of gadget4-20210622 compile the code using

make CONFIG=../Test-Case-A/Config.h EXEC=../Test-Case-A/gadget4-exe

3. cd ../Test-Case-A and do the following:

3.1. In param.txt adjust the maximum memory size. For the batch partition in JUWELS the RAM/core=2 GB.

%----- Memory alloction
MaxMemSize        1800

3.2. Maximum evolution time TimeMax=1.0

4. For a run with 1024 cores, in the slurm script (here named slurm_juwels.sh) set

#SBATCH --nodes=32
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-socket=17
#SBATCH --ntasks-per-node=33

This implies that we are using 1056 cores.

Note that Gadget-4 uses one core per compute node to handle communications. Hence, when allocating compute 
nodes we must take into account an extra core. So, if we want to run the code with 16 mpi tasks/socket we 
must allocate 33 cores per compute node.

For a run with 1024 cores in 32 nodes we allocate 1056 cores. 

The code output looks like the following (see the last line for the 1024 cores being used in the run):


Running on hosts: jwc03n[082-097,169-184]
Running on 32 nodes.
Running on 1056 processors.
Current working directory is /p/project/prpb84/Test-Case-A/00512_cores

Shared memory islands host a minimum of 33 and a maximum of 33 MPI ranks.
We shall use 32 MPI ranks in total for assisting one-sided communication (1 per shared memory node).

  ___    __    ____    ___  ____  ____       __
 / __)  /__\  (  _ \  / __)( ___)(_  _)___  /. |
( (_-. /(__)\  )(_) )( (_-. )__)   )( (___)(_  _)
 \___/(__)(__)(____/  \___/(____) (__)       (_)

This is Gadget, version 4.0.
Git commit 8ee7f358cf43a37955018f64404db191798a32a3, Tue Jun 15 15:10:36 2021 +0200

Code was compiled with the following compiler and flags:
...

Code was compiled with the following settings:
    COOLING
    DOUBLEPRECISION=1
    GADGET2_HEADER
    MULTIPOLE_ORDER=3
    NSOFTCLASSES=2
    NTYPES=6
    POSITIONS_IN_64BIT
    SELFGRAVITY
    STARFORMATION
    TREE_NUM_BEFORE_NODESPLIT=4


Running on 1024 MPI tasks.

