README.md 8.26 KB
Newer Older
Ozan Karsavuran's avatar
Ozan Karsavuran committed
1
# NEMO
Ozan Karsavuran's avatar
Ozan Karsavuran committed
2
3
4
5


## Summary Version

Ozan Karsavuran's avatar
Ozan Karsavuran committed
6
1.1
Ozan Karsavuran's avatar
Ozan Karsavuran committed
7
8
9
10
11
12
13
14

## Purpose of Benchmark

NEMO (Nucleus for European Modelling of the Ocean) is a mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by a European consortium. It is intended to be a tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).
Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity. In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid for most of the cases.

## Characteristics of Benchmark

15
The model is implemented in Fortran 90, with pre-processing (C-pre-processor). It is optimised for vector computers and parallelised by domain decomposition with MPI. It supports modern C/C++ and Fortran compilers. All input and output is done with third party software called XIOS with a dependency on NetCDF (Network Common Data Form) and HDF5. It is highly scalable and a perfect application for measuring supercomputing performances in terms of compute capacity, memory subsystem, I/O and interconnect performance.
Ozan Karsavuran's avatar
Ozan Karsavuran committed
16
17
18
19
20
21
22
23
24
25

## Mechanics of Building Benchmark

### Building XIOS
1.	Download the XIOS source code:
    ```
    svn co https://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/branchs/xios-2.5
    ```
2.	There are available known architectures which can be seen with the following command:
    ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
26
    ./make_xios --avail
Ozan Karsavuran's avatar
Ozan Karsavuran committed
27
28
29
30
31
32
33
34
35
36
    ```
    
    If target architecture is a known one, it can be built by the following command:
    ```
    ./make_xios --arch X64_CURIE
    ```
    Otherwise `arch-local.env`, `arch-local.fcm`, `arch-local.path` files should be placed according to target architecture. Then build by:
    ```
    ./make_xios --arch local
    ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
37
    Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. These files should be used as starting point, that is updates might be required according to system upgrades etc.
Ozan Karsavuran's avatar
Ozan Karsavuran committed
38

Ozan Karsavuran's avatar
Ozan Karsavuran committed
39
Note that XIOS requires `Netcdf4`. Please load the appropriate `HDF5` and `NetCDF4` modules. If path to these models are not loaded, you might have to change the path in the configuration file.
Ozan Karsavuran's avatar
Ozan Karsavuran committed
40
41
42
43
44
45

### Building NEMO
1.	Download the XIOS source code:
	```
    svn co https://forge.ipsl.jussieu.fr/nemo/svn/NEMO/releases/release-4.0
    ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
46
2.	Copy and setup the appropriate architecture file in the arch folder. Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. These files should be used as starting point, that is updates might be required according to system upgrades etc. The following changes are recommended for the GNU compilers:
Ozan Karsavuran's avatar
Ozan Karsavuran committed
47
48
49
50
51
    ```
    a.	add the `-lnetcdff` and `-lstdc++` flags to NetCDF flags
    b.	using `mpif90` which is a MPI binding of `gfortran-4.9`
    c.	add `-cpp` and `-ffree-line-length-none` to Fortran flags
    ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
52
3.	Apply the patch as described here to measure step time :
Ozan Karsavuran's avatar
Ozan Karsavuran committed
53
54
55
    ```
    https://software.intel.com/en-us/articles/building-and-running-nemo-on-xeon-processors
    ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
56
57
    You may also use [nemogcm.F90](nemogcm.F90)  by replacing it with `src/OCE/nemogcm.F90`
    
58
59
4.  Add `GYRE_testing OCE TOP` line to `refs_cfg.txt` file under `cfgs` folder.
    Then go to cfgs folder and:
Ozan Karsavuran's avatar
Ozan Karsavuran committed
60
61
62
63
64
65
66
	```
	mkdir GYRE_testing
	rsync -arv GYRE_PISCES/* GYRE_testing/
	mv GYRE_testing/cpp_GYRE_PISCES.fcm GYRE_testing/cpp_GYRE_testing.fcm
	sed -i 's/key_top/key_nosignedzero/g' GYRE_testing/cpp_GYRE_testing.fcm
	```
        
67
5.	Then build the executable with the following command
Ozan Karsavuran's avatar
Ozan Karsavuran committed
68
	```
69
    ./makenemo -m MY_CONFIG -r GYRE_testing
Ozan Karsavuran's avatar
Ozan Karsavuran committed
70
    ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
71
72
73
74

## Mechanics of Running Benchmark

### Prepare input files
Ozan Karsavuran's avatar
Ozan Karsavuran committed
75
	cd GYRE_testing/EXP00
Ozan Karsavuran's avatar
Ozan Karsavuran committed
76
	sed -i '/using_server/s/false/true/' iodef.xml
Ozan Karsavuran's avatar
Ozan Karsavuran committed
77
	sed -i '/ln_bench/s/false/true/' namelist_cfg
Ozan Karsavuran's avatar
Ozan Karsavuran committed
78
79

### Run the experiment interactively
Ozan Karsavuran's avatar
Ozan Karsavuran committed
80
	mpirun -n 4 nemo : -n 2 $PATH_TO_XIOS/bin/xios_server.exe
Ozan Karsavuran's avatar
Ozan Karsavuran committed
81
82
83
84

### GYRE configuration with higher resolution
Modify configuration (for example for the test case A):
```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
85
    rm -f time.step solver.stat output.namelist.dyn ocean.output  slurm-*  GYRE_*
Ozan Karsavuran's avatar
Ozan Karsavuran committed
86
    sed -i -r \
Ozan Karsavuran's avatar
Ozan Karsavuran committed
87
88
89
90
        -e 's/^( *nn_itend *=).*/\1 101/' \
        -e 's/^( *nn_write *=).*/\1 4320/' \
        -e 's/^( *nn_GYRE *=).*/\1 48/' \
        -e 's/^( *rn_rdt *=).*/\1 1200/' \
Ozan Karsavuran's avatar
Ozan Karsavuran committed
91
92
93
94
        namelist_cfg
```

## Verification of Results
Ozan Karsavuran's avatar
Ozan Karsavuran committed
95
The GYRE configuration is set through the `namelist_cfg` file. The horizontal resolution is determined by setting `nn_GYRE` as follows:
Ozan Karsavuran's avatar
Ozan Karsavuran committed
96
97
    
   ```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
98
99
   Jpiglo = 30 × nn_GYRE + 2
   Jpjglo = 20 × nn_GYRE + 2
Ozan Karsavuran's avatar
Ozan Karsavuran committed
100
101
   ```

Ozan Karsavuran's avatar
Ozan Karsavuran committed
102
In this configuration, we use a default value of 30 ocean levels, depicted by `jpkglo=31`. The GYRE configuration is an ideal case for benchmark tests as it is very simple to increase the resolution and perform both weak and strong scalability experiment using the same input files.  We use two configurations as follows:
Ozan Karsavuran's avatar
Ozan Karsavuran committed
103

Ozan Karsavuran's avatar
Ozan Karsavuran committed
104
105
Test Case A:
```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
106
107
    nn_GYRE = 48 suitable up to 1000 cores
	Number of Time steps: 101
Ozan Karsavuran's avatar
Ozan Karsavuran committed
108
109
110
111
112
	Time step size: 20 mins
	Number of seconds per time step: 1200
```
Test Case B:
```
Ozan Karsavuran's avatar
Ozan Karsavuran committed
113
114
    nn_GYRE = 192 suitable up to 20,000 cores.
	Number of time step: 101
Ozan Karsavuran's avatar
Ozan Karsavuran committed
115
116
117
118
	Time step size(real): 20 mins
	Number of seconds per time step: 1200
```

119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

We report the performance in terms of total time to solution as well as total consumed energy to solution whenever possible. 
This helps us to compare systems in a standard manner across all combinations of system architectures. 

NEMO supports both attached and detached mode of the IO server. In the attached mode all cores perform both computation and IO, 
whereas in the detached mode each core performs either computation or IO. 
It is reported that NEMO performs better with detached mode for especially large number of cores. 
Therefore, we performed benchmarks for both attached and detached modes. 
We utilise 15:1 ratio for the detached mode. That is, we divide 1024 cores as 960 compute cores and 64 IO cores for Test Case A, 
whereas we divide 10240 cores as 9600 compute cores and 640 IO cores for Test Case B.

Performance comparison between Test Cases A and B run on 1024 and 10240 processors, respectively, 
can be considered as something between weak and strong scaling. 
That is, number of processors are increased ten times, however the increase in the mesh size is approximately 16 times, 
when we go from Test Case A to B. 

We use total time reported by XIOS server.
But also to measure the step time, we inserted a patch which includes the `MPI_Wtime()` functional call in [nemogcm.F90](nemogcm.F90) file 
for each step which also cumulatively adds the step time until the second last step. 
We then divide the total cumulative time by the number of time steps to average out any overhead.

<!--We performed scalability test on 512 cores and 1024 cores for test case A. We performed scalability test for 4096 cores, 8192 cores and 16384 cores for test case B.
Both these test cases can give us quite good understanding of node performance and interconnect behavior. -->
Ozan Karsavuran's avatar
Ozan Karsavuran committed
142
143
<!--We switch off the generation of mesh files by setting the `flag nn_mesh = 0` in the `namelist_ref` file. 
Also `using_server = false` is defined in `io_server` file.-->
Ozan Karsavuran's avatar
Ozan Karsavuran committed
144

145
<!--We report the performance in step time which is the total computational time averaged over the number of time steps for different test cases. 
Ozan Karsavuran's avatar
Ozan Karsavuran committed
146
147
148
149
This helps us to compare systems in a standard manner across all combinations of system architectures. 
The other main reason for reporting time per computational time step is to make sure that results are more reproducible and comparable.
Since NEMO supports both weak and strong scalability, 
test case A and test case B both can be scaled down to run on smaller number of processors while keeping the memory per processor constant achieving similar 
150
151
results for step time. 
-->
Ozan Karsavuran's avatar
Ozan Karsavuran committed
152
153
154
155
156
157
158
159

## Sources
<https://forge.ipsl.jussieu.fr/nemo/chrome/site/doc/NEMO/guide/html/install.html>

<https://forge.ipsl.jussieu.fr/ioserver/wiki/documentation>

<https://nemo-related.readthedocs.io/en/latest/compilation_notes/nemo37.html>