f8829881 · 9e6a7d4c · f8829881 · f8829881 · f8829881 · f8829881
--- a/namd/README.md
+++ b/namd/README.md
+# NAMD
+
+
+## Summary Version
+1.0
+
+## Purpose of Benchmark
+NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of compute platforms.
+
+## Characteristics of Benchmark
+
+NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of 
+compute platforms. NAMD is developed by the “Theoretical and Computational Biophysics Group” at the University of Illinois 
+at Urbana Champaign. In the design of NAMD particular emphasis has been placed on scalability when utilising a 
+large number of processors. The application can read a wide variety of different file formats, 
+for example force fields, protein structures, which are commonly used in bio-molecular science. 
+A NAMD license can be applied for on the developer’s website free of charge. 
+Once the license has been obtained, binaries for a number of platforms and the source can be downloaded from the website. 
+Deployment areas of NAMD include pharmaceutical research by academic and industrial users. 
+NAMD is particularly suitable when the interaction between a number of proteins or between proteins and other 
+chemical substances is of interest. 
+Typical examples are vaccine research and transport processes through cell membrane proteins. 
+NAMD is written in C++ and parallelised using Charm++ parallel objects, which are implemented on top of MPI or Infiniband Verbs, 
+supporting both pure MPI and hybrid parallelisation. 
+Offloading for accelerators is implemented for both GPU and MIC (Intel Xeon Phi KNC).
+
+
+## Mechanics of Building Benchmark
+
+NAMD supports various build types.
+In order to run current benchmarks the memopt build with SMP and Tcl support is mandatory.
+ 
+NAMD should be compiled in memory-optimized mode that utilizes a compressed version of the molecular 
+structure and Forcefield and  supports parallel I/O. 
+In addition to reducing per-node memory requirements, the compressed datafiles reduce startup times 
+compared to reading ascii PDB and PSF files. 
+
+In order to build this version using MPI, the MPI implementation should support: `MPI_THREAD_FUNNELED`.
+
+* Uncompress/extract the tar source archive.
+
+* Typically source is in `NAMD_VERSION_Source`.
+
+* `cd NAMD_VERSION_Source`
+
+* Untar the `charm-VERSION.tar`
+
+* `cd charm-VERSION`
+
+* Configure and compile charm++. This step is system dependent.
+  In most cases, the `mpi-linux-x86_64` architecture works.
+  On systems with GPUs and Infiniband, the `verbs-linux-x86_64` arch should be used.
+  One may explore the available `charm++` arch files in `charm-VERSION/src/arch` directory.
+
+  `./build charm++ mpi-linux-x86_64  smp mpicxx --with-production -DCMK_OPTIMIZE`
+
+   or for CUDA enabled NAMD
+
+  `./build charm++ verbs-linux-[x86_64|ppc64le] smp [gcc|xlc64] 
+                    --with-production -DCMK_OPTIMIZE`
+
+For x86_64 architecture one may use Intel Compilers specifying icc instead of gcc in charm++ configuration.
+Issue `./build --help` for help for additional options
+
+
+   The build script will configure and compile charm++. Its files are placed in a directory 
+   inside charm-VERSION tree with name the combination of ARCH, compiler etc.
+   List the contents of the charm-VERSION directory and note the name of the extra directory.
+
+*  `cd ..`
+ 
+*  Configure NAMD
+
+   There are arch files with settings for various types of systems. Check arch directory for  possibilities.
+   The config tool in the NAMD_Source-VERSION is used to configure the build.
+   Some options must be specified. 
+
+```
+    ./config  Linux-x86_64-g++ \
+              --charm-base ./charm-VERSION \
+              --charm-arch charm-ARCH \
+              --with-fftw3 --fftw-prefix PATH_TO_FFTW_INSTALLATION \
+              --with-tcl --tcl-prefix PATH_TO_TCL \
+              --with-memopt \
+              --cc-opts '-O3 -march=native -mtune=native ' \
+              --cxx-opts '-O3 -march=native -mtune=native '  \
+              NEXT TWO ARE OPTIONAL, ONLY for GPU enabled runs
+              --with-cuda \
+              --cuda-prefix PATH_TO_CUDA_INSTALLATION 
+```
+
+It should be noted that for GPU support one has to use `verbs-linux-x86_64` instead of `mpi-linux-x86_64`.
+You can issue `./config --help` to see all available options.
+
+What is absolutely necessary are the options : 
+
+**  `--with-memopt`  **
+
+**  `SMP build of charm++` **
+
+** `--with-tcl` **
+    
+You need to specify the fftw3 installation directory. On systems that use environment modules you need 
+to load the existing fftw3 module and probably use the provided environment variables. 
+If fftw3 libraries are not installed on your system, download and install fftw-3.3.9.tar.gz from [http://www.fftw.org/](http://www.fftw.org/).
+
+When config finish, it  prompts to change to a directory named ARCH-Compiler and run make. 
+If everything is ok you'll find the executable with name `namd2` in this directory.
+In this directory, there will be also a binary or shell script depending on architecture called `charmrun`. 
+This is necessary for GPU runs on some systems.
+
+
+### Download the source code
+
+The official site to download namd is : [http://www.ks.uiuc.edu/Research/namd/](https://www.ks.uiuc.edu/Research/namd/)
+
+You need to register for free here to get a namd copy from here : 
+[https://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD](https://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD)
+
+
+### Mechanics of Running Benchmark
+The general way to run the benchmarks with hybrid parallel executable, assuming SLURM Resource/Batch manager is: 
+
+```
+...
+#SBATCH --cpus-per-task=X
+#SBATCH --tasks-per-node=Y
+#SBATCH --nnodes=Z
+...
+load necessary environment modules, like compilers, possible libraries etc. 
+
+PPN=`expr $SLURM_CPUS_PER_TASK - 1 `
+
+
+parallel_launcher launcher_options path_to_/namd2 ++ppn $PPN \
+TESTCASE.namd > \
+TESTCASE.Nodes.$SLURM_NNODES.TasksPerNode.$SLURM_TASKS_PER_NODE.ThreadsPerTask.$SLURM_CPUS_PER_TASK.JobID.$SLURM_JOBID 
+```
+Where:
+
+* The `parallel_launcher` may be `srun`, `mpirun`, `mpiexec`, `mpiexec.hydra` or some variant such as `aprun` on Cray systems.
+* `launcher_options` specifies parallel placement in terms of total numbers of nodes, MPI ranks/tasks, tasks per node, and OpenMP.
+* The variable `PPN` is the number of threads per task minus 1. This is necessary since namd uses one thread for communication.
+* You can try almost any combination of tasks per node and threads per task to investigate absolute performance and scaling on the machine of interest
+ as far as the product of `tasks_per_node x threads_per_task` is equal to the total threads available on each node.
+Increasing the number of tasks per node, increases the number of communication
+threads. Usually, the best performance is obtained using number of tasks per node equal to the number of sockets per node.
+It depends on : Test Case, Machine configuration (by means of memory slots, available cores, Hyperthreading enabled or not etc.) which configuration
+gives the higher performance.
+
+On machines with  GPUs :
+
+* The verbs-ARCH architecture should be used instead of MPI.
+
+* The number of tasks per node should be equal to the number of GPUs per node.
+
+* Typically the Batch system when one allocates GPUs, sets the environment
+  cariable `CUDA_VISIBLE_DEVICES`. NAMD gets the list of devices to use from this variable. If for any reason no gpus are reported by namd, then one should add
+the flags `+devices $CUDA_VISBLE_DEVICES` to namd flags.
+
+* In some cases, due to the enabled features of the parallel_launcher, one 
+  has to use the shipped in namd charmrun together with typically hydra parallel launcher that uses ssh to spawn processes. In this case passwordless ssh should be enabled between compute nodes.
+
+In this case the corresponding script should look like :
+
+```
+PPN=`expr $SLURM_CPUS_PER_TASK - 1`
+P="$(($PPN * $SLURM_NTASKS_PER_NODE * $SLURM_NNODES))"
+PROCSPERNODE="$(($SLURM_CPUS_PER_TASK * $SLURM_NTASKS_PER_NODE))"
+
+for n in `echo $SLURM_NODELIST | scontrol show hostnames`;  do \
+           echo "host $n ++cpus $PROCSPERNODE" >> nodelist.$SLURM_JOB_ID 
+done;
+
+PATH_TO_charmrun ++mpiexec +p $P  PATH_TO_namd2 ++ppn $PPN \
+                 ++nodelist ./nodelist.$SLURM_JOB_ID   +devices $CUDA_VISIBLE_DEVICES \
+  TESTCASE.namd > TESTCASE.Nodes.$SLURM_NNODES.TasksPerNode.$PPN.ThreadsPerTask.$SLURM_CPUS_PER_TASK.log
+
+```
+
+### UEABS Benchmarks
+
+The datasets are based on the original `Satellite Tobacco Mosaic Virus (STMV)` 
+dataset from the official NAMD site. The memory optimised build of the 
+package and data sets are used in benchmarking. Data are converted to 
+the appropriate binary format used by the memory optimised build.
+
+**A) Test Case A: STMV.8M **
+
+This is a 2×2×2 replication of the original STMV dataset from the official NAMD site. The system contains roughly 8 million atoms. 
+
+Download test Case A [https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseA.tar.gz](https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseA.tar.gz)
+
+**B) Test Case B: STMV.28M **
+
+This is a 3×3×3 replication of the original STMV dataset from the official NAMD site, created during PRACE-2IP project. The system contains roughly 28 million atoms and is expected to scale efficiently up to few tens of thousands x86 cores.
+
+Download test Case B [https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseB.tar.gz](https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseB.tar.gz)
+
+**C) Test Case C: STMV.210M **
+
+This is a 5×6×7 replication of the original STMV dataset from the official NAMD site. The system contains roughly 210 million atoms and is expected to scale efficiently up to more than hundred thousand recent x86 cores.
+
+Download test Case C [https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseC.tar.gz](https://repository.prace-ri.eu/ueabs/NAMD/2.2/NAMD_TestCaseC.tar.gz)
+
+## Performance 
+
+NAMD Reports some timings in logfile. At the end of the log file,
+WallClock, CPU Time and used memory is reported, the line looks like :
+
+`WallClock: 629.896729  CPUTime: 629.896729  Memory: 2490.726562 MB`
+
+One may obtain the execution time by :
+
+`grep WallClock logfile | awk -F ' ' '{print $2}'`.
+
+Since input data have size of order of few GB and NAMD writes the the end a similar amount of data, it is common depending on the filesystem load to have higher and usually not reproducible startup and close times.
+One has to check if the startup and close times are not more more than 1-2 seconds by :
+
+`grep "Info: Finished startup at" logfile | awk -F ' ' '{print $5}'`
+
+`grep "file I/O" logfile | awk -F ' ' '{print $7}'`
+
+If the reported startup and close times are significant, they should be subtracted from the reported WallClock in order to obtain the real run performance.
+
--- a/namd/README_ACC.md
+++ b/namd/README_ACC.md
-# NAMD Build and Run instructions using CUDA, KNC offloading and KNL.
-
-## CUDA Build instructions
-
-In order to run benchmarks, the memopt build with SMP support is mandatory.
- 
-NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O. 
-In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file. 
-
-Since NAMD version 2.11, build scripts deny to compile with MPI for accelerators. Instead, the verbs interface is suggested.
-You could overwrite this and use MPI instead the suggested verbs by commenting out the following lines in config script
-```
-if ( $charm_arch_mpi || ! $charm_arch_smp ) then
-       echo ''
-       echo "ERROR: $ERRTYPE builds require non-MPI SMP or multicore Charm++ arch for reasonable performance."
-       echo ''
-       echo "Consider ibverbs-smp or verbs-smp (InfiniBand), gni-smp (Cray), or multicore (single node)."
-       echo ''
-       exit 1
-     endif
-```
-
-You need a NAMD 2.11 version or newer.
-
-* Uncompress/tar the source.
-* cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source )
-* untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm.
-* cd to charm-VERSION directory
-
-### configure and compile charm :
-
-This step is system dependent. Some examples are :
-
-Linux with Intel compilers  :  
-```
-./build charm++ verbs-linux-x86_64 smp icc --with-production -O -DCMK_OPTIMIZE
-```
-Linux with GNU   compilers  : 
-
-```
-./build charm++ verbs-linux-x86_64 smp gcc  --with-production -O -DCMK_OPTIMIZE
-```
-
-Help:
-
-```
-./build --help to see all available options.
-```
- 
-For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) .
-
-The syntax is : 
-```
-./build charm++ ARCHITECTURE  smp (compilers, optional)   --with-production -O -DCMK_OPTIMIZE
-```
-You can find a list of supported architectures/compilers in `charm-VERSION/src/arch`
-
-The smp option is mandatory to build the Hybrid version of namd.
-This builds charm++.
-
-`cd ..`
-
-###	Configure NAMD.
-
-This step is system dependent. Some examples are :
-
-Linux x86_64/AVX with Intel Compilers : 
-
-```
-./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch verbs-linux-x86_64-smp-icc  --with-fftw3 \
--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --with-cuda \
--cuda-prefix PATH_TO_CUDA_INSTALLATION_ROOT \
--charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "
-```
-
-
-Help      : 
-```
-./config --help to see all available options.
-```
-
-See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html)  for special notes on various systems.
-
-What is absolutely necessary is the option : `--with-memopt, --with-cuda`  and an SMP enabled charm++ build.
-It is suggested to disable tcl support as it is indicated by the `--without-tcl` flag, since tcl is not necessary
-to run the benchmarks.
-
-You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables. 
-If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/) .
-
-You may adjust the compilers and compiler flags as in the Linux x86_64/AVX example. 
-A typical use of compilers/flags adjustement is for example to add `-xAVX` and keep all the other compiler flags of the architecture the same.
-Take care or even just avoid using the `--cxx` option for NAMD config with no reason,  as this will override the compilation flags from the arch files in some cases.
-
-When config ends prompts to change to a directory and run make. 
-
-###	cd to the reported directory and run make 
-
-If everything is ok you'll find the executable with name namd2
-and the paraller wrapper called charmrun in this directory.
-
-
-
-
-##  KNC/offloading  Build instructions
-
-The build instruction for building namd binaries for offloading on KNC are similar to those of GPU with some modifications
-
-The namd configure stage contains `--with-mic` instead of `--with-cuda`. The rest of options is the same
-For example :
-Linux x86_64/AVX with Intel Compilers : 
-```
-./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch verbs-linux-x86_64-smp-icc  --with-fftw3 \
-					--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt \
-					--with-mic \
-					--charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
-					--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "
-```
-What is absolutely necessary is the option : `--with-memopt`, `--with-mic`  and an SMP enabled charm++ build.
-
-
-##  KNL Build instructions
-For KNL follow the Build and Run Instructions in UEABS suite replacing the compiler flag `-xAVX` to `-xMIC-AVX512`
-
-
-## Run Instructions
-
-After build of NAMD you have the NAMD executable called namd2 and the parallel wrapper called charmrun.
-
-The best performance and scaling of namd is usually achieved using hybrid MPI/SMP version. 
-On a system with nodes of NC cores per node use 1 MPI task per node and NC threads per task,
-for example on a 20 cores/node system use 1 MPI process,
-set `OMP_NUM_THREADS` or any batch system related variable to 20. 
-
-Set a  variable, for example PPN to NC-1, 
-for example to 19 for a 20 cores/node system.
-
-Since charmrun is used as parallel wrapper, one needs to specify the total number of tasks, threads per task and a hostfile in charmrun command line.
-
-You can also try other combinations of `TASKSPERNODE/THREADSPERTASK` to check.
-
-In order to use Accelerators, you need to specify the Accelerator devices in command line.
-Typically one gets this information from the batch system. 
-For example, in case of SLURM workload manager, this variable is `$SLURM_JOB_GPUS` for GPUS
-or `$OFFLOAD_DEVICES` for KNC.
-
-Typical values of these variables are : 0 or 1 in the case you request 1 accelerator per node,
-or 0,1 in the case you request 2 accelerators per node etc.
-
-
-The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems.
-
-The general way to run the accelerated NAMD is :
-```
-charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices AcceleratorsIDS configfile > logfile
-```
-
-In the case of SLURM workload manager :
-```
-PPN=`expr $SLURM_CPUS_PER_TASK - 1`
-P=`expr $SLURM_NNODES \* $PPN `
-for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do echo "host $n ++cpus $SLURM_NTASKS_PER_NODE" >> hostfile; done;
-```
-for GPUs
-```
-charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $SLURM_JOB_GPUS stmv.8M.memopt.namd > logfile
-```
-for KNC
-```
-charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $OFFLOAD_DEVICES configfile > logfile
-```
-The run walltime is reported at the end of logfile : `grep WallClock: logfile  | awk -F ' ' '{print $2}'`
-
-
-
--- a/nemo/README.md
+++ b/nemo/README.md
-NEMO 3.6, GYRE configuration
-============================
+# NEMO
+
+
+## Summary Version
+
+1.1
+
+## Purpose of Benchmark
+
+NEMO (Nucleus for European Modelling of the Ocean) is a mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by a European consortium. It is intended to be a tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).
+Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity. In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid for most of the cases.
+
+## Characteristics of Benchmark
+
+The model is implemented in Fortran 90, with pre-processing (C-pre-processor). It is optimised for vector computers and parallelised by domain decomposition with MPI. It supports modern C/C++ and Fortran compilers. All input and output is done with third party software called XIOS with a dependency on NetCDF (Network Common Data Form) and HDF5. It is highly scalable and a perfect application for measuring supercomputing performances in terms of compute capacity, memory subsystem, I/O and interconnect performance.
+
+## Mechanics of Building Benchmark
+
+### Building XIOS
+1.	Download the XIOS source code:
+    ```
+    svn co https://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/branchs/xios-2.5
+    ```
+2.	There are available known architectures which can be seen with the following command:
+    ```
+    ./make_xios --avail
+    ```
+    
+    If target architecture is a known one, it can be built by the following command:
+    ```
+    ./make_xios --arch X64_CURIE
+    ```
+    Otherwise `arch-local.env`, `arch-local.fcm`, `arch-local.path` files should be placed according to target architecture. Then build by:
+    ```
+    ./make_xios --arch local
+    ```
+    Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. These files should be used as starting point, that is updates might be required according to system upgrades etc.
+
+Note that XIOS requires `Netcdf4`. Please load the appropriate `HDF5` and `NetCDF4` modules. If path to these models are not loaded, you might have to change the path in the configuration file.
+
+### Building NEMO
+1.	Download the XIOS source code:
+	```
+    svn co https://forge.ipsl.jussieu.fr/nemo/svn/NEMO/releases/release-4.0
+    ```
+2.	Copy and setup the appropriate architecture file in the arch folder. Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. These files should be used as starting point, that is updates might be required according to system upgrades etc. The following changes are recommended for the GNU compilers:
+    ```
+    a.	add the `-lnetcdff` and `-lstdc++` flags to NetCDF flags
+    b.	using `mpif90` which is a MPI binding of `gfortran-4.9`
+    c.	add `-cpp` and `-ffree-line-length-none` to Fortran flags
+    ```
+3.	Apply the patch as described here to measure step time :
+    ```
+    https://software.intel.com/en-us/articles/building-and-running-nemo-on-xeon-processors
+    ```
+    You may also use [nemogcm.F90](nemogcm.F90)  by replacing it with `src/OCE/nemogcm.F90`
+    
+4.  Add `GYRE_testing OCE TOP` line to `refs_cfg.txt` file under `cfgs` folder.
+    Then go to cfgs folder and:
+	```
+	mkdir GYRE_testing
+	rsync -arv GYRE_PISCES/* GYRE_testing/
+	mv GYRE_testing/cpp_GYRE_PISCES.fcm GYRE_testing/cpp_GYRE_testing.fcm
+	sed -i 's/key_top/key_nosignedzero/g' GYRE_testing/cpp_GYRE_testing.fcm
+	```
+        
+5.	Then build the executable with the following command
+	```
+    ./makenemo -m MY_CONFIG -r GYRE_testing
+    ```
+
+## Mechanics of Running Benchmark
+
+### Prepare input files
+	cd GYRE_testing/EXP00
+	sed -i '/using_server/s/false/true/' iodef.xml
+	sed -i '/ln_bench/s/false/true/' namelist_cfg

-created by juha.lento@csc.fi, 2016-05-16
-
-Build and test documentation for NEMO 3.6 in GYRE
-configuration. Example commands are tested in CSC's Cray XC40,
-`sisu.csc.fi`.
-
-updated by sagar.dolas@surfsara.nl in context of benchmarking efforts on Tier-0 supercomputing systems. 
-
-Most of the following instructions should work as it is. The key part is to install XIOS because NEMO needs XIOS libraries to compile. The GYRE configuration is pretty easy to compile and run. Please also refer to following guide for reference. There is not one standard rule for compiling NEMO on all plateforms. Each plateform needs a different and unique approach to make NEMO work. The standard workflow is like this : 
-
-1. Install XIOS. 
-2. Install NEMO. 
-3. Apply the patch as described here to measure step time : <https://software.intel.com/en-us/articles/building-and-running-nemo-on-xeon-processors>
-4. Run NEMO in GYRE configuration.
-
-Download NEMO and XIOS sources
------------------------------
-
-### Check out NEMO sources
-
-`
-svn co -r6542 http://forge.ipsl.jussieu.fr/nemo/svn/branches/2015/nemo_v3_6_STABLE/NEMOGCM
-`
-
-### Check out XIOS2 sources
-
-<http://www.nemo-ocean.eu/Using-NEMO/User-Guides/Basics/XIOS-IO-server-installation-and-use>
-
-```
-svn co -r819 http://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/trunk xios-2.0
-```
-
-
-Build XIOS
----------
-
-### Build environment
-
-Xios requires Netcdf4. Please load the appropriate HDF5 and NetCDF4 modules. You might have to change the path in the configuration file. 
-
-```
-module load cray-hdf5-parallel cray-netcdf-hdf5parallel
-```
-
-
-### Build command
-
-<http://forge.ipsl.jussieu.fr/ioserver/wiki/documentation>
-
-```
-cd xios-2.0
-./make_xios --arch XC30_Cray
-```
-
-Build NEMO 3.6 in GYRE configuration
------------------------------------
-
-### Get a bash helper for editing configuration files
-
-```
-source <(curl -s https://raw.githubusercontent.com/jlento/nemo/master/fixfcm.bash)
-```
-
-...or if you have a buggy bash 3.2...
+### Run the experiment interactively
+	mpirun -n 4 nemo : -n 2 $PATH_TO_XIOS/bin/xios_server.exe

+### GYRE configuration with higher resolution
+Modify configuration (for example for the test case A):
 ```
-wget https://raw.githubusercontent.com/jlento/nemo/master/fixfcm.bash; source fixfcm.bash
+    rm -f time.step solver.stat output.namelist.dyn ocean.output  slurm-*  GYRE_*
+    sed -i -r \
+        -e 's/^( *nn_itend *=).*/\1 101/' \
+        -e 's/^( *nn_write *=).*/\1 4320/' \
+        -e 's/^( *nn_GYRE *=).*/\1 48/' \
+        -e 's/^( *rn_rdt *=).*/\1 1200/' \
+        namelist_cfg
 ```

+## Verification of Results
+The GYRE configuration is set through the `namelist_cfg` file. The horizontal resolution is determined by setting `nn_GYRE` as follows:
+    
+   ```
+   Jpiglo = 30 × nn_GYRE + 2
+   Jpjglo = 20 × nn_GYRE + 2
+   ```

-### Edit (create) configuration files
+In this configuration, we use a default value of 30 ocean levels, depicted by `jpkglo=31`. The GYRE configuration is an ideal case for benchmark tests as it is very simple to increase the resolution and perform both weak and strong scalability experiment using the same input files.  We use two configurations as follows:

+Test Case A:
 ```
-cd ../NEMOGCM/CONFIG
-fixfcm < ../ARCH/arch-XC40_METO.fcm > ../ARCH/arch-MY_CONFIG.fcm \
-	NCDF_HOME="$NETCDF_DIR" \
-	HDF5_HOME="$HDF5_DIR" \
-	XIOS_HOME="$(readlink -f ../../xios-2.0)"
+    nn_GYRE = 48 suitable up to 1000 cores
+	Number of Time steps: 101
+	Time step size: 20 mins
+	Number of seconds per time step: 1200
 ```
-
-
-### Build
-
+Test Case B:
 ```
-./makenemo -m MY_CONFIG -r GYRE_XIOS -n MY_GYRE add_key "key_nosignedzero"
+    nn_GYRE = 192 suitable up to 20,000 cores.
+	Number of time step: 101
+	Time step size(real): 20 mins
+	Number of seconds per time step: 1200
 ```

-Apply the patch as described here to measure step time : <https://software.intel.com/en-us/articles/building-and-running-nemo-on-xeon-processors>, otherwise it becomes extermely difficult to measure computational time.

-Run first GYRE test
-------------------
+We report the performance in terms of total time to solution as well as total consumed energy to solution whenever possible. 
+This helps us to compare systems in a standard manner across all combinations of system architectures. 

-Adjust the following configuration or settings as per your HPC site's preferences.
+NEMO supports both attached and detached mode of the IO server. In the attached mode all cores perform both computation and IO, 
+whereas in the detached mode each core performs either computation or IO. 
+It is reported that NEMO performs better with detached mode for especially large number of cores. 
+Therefore, we performed benchmarks for both attached and detached modes. 
+We utilise 15:1 ratio for the detached mode. That is, we divide 1024 cores as 960 compute cores and 64 IO cores for Test Case A, 
+whereas we divide 10240 cores as 9600 compute cores and 640 IO cores for Test Case B.

-### Preapare input files
+Performance comparison between Test Cases A and B run on 1024 and 10240 processors, respectively, 
+can be considered as something between weak and strong scaling. 
+That is, number of processors are increased ten times, however the increase in the mesh size is approximately 16 times, 
+when we go from Test Case A to B. 

-```
-cd MY_GYRE/EXP00
-sed -i '/using_server/s/false/true/' iodef.xml
-sed -i '/&nameos/a ln_useCT = .false.' namelist_cfg
-sed -i '/&namctl/a nn_bench = 1' namelist_cfg
-```
+We use total time reported by XIOS server.
+But also to measure the step time, we inserted a patch which includes the `MPI_Wtime()` functional call in [nemogcm.F90](nemogcm.F90) file 
+for each step which also cumulatively adds the step time until the second last step. 
+We then divide the total cumulative time by the number of time steps to average out any overhead.

-### Run the experiment interactively
-
-```
-aprun -n 4 ../BLD/bin/nemo.exe : -n 2 ../../../../xios-2.0/bin/xios_server.exe
-```
+<!--We performed scalability test on 512 cores and 1024 cores for test case A. We performed scalability test for 4096 cores, 8192 cores and 16384 cores for test case B.
+Both these test cases can give us quite good understanding of node performance and interconnect behavior. -->
+<!--We switch off the generation of mesh files by setting the `flag nn_mesh = 0` in the `namelist_ref` file. 
+Also `using_server = false` is defined in `io_server` file.-->

+<!--We report the performance in step time which is the total computational time averaged over the number of time steps for different test cases. 
+This helps us to compare systems in a standard manner across all combinations of system architectures. 
+The other main reason for reporting time per computational time step is to make sure that results are more reproducible and comparable.
+Since NEMO supports both weak and strong scalability, 
+test case A and test case B both can be scaled down to run on smaller number of processors while keeping the memory per processor constant achieving similar 
+results for step time. 
+-->

-GYRE configuration with higher resolution
-----------------------------------------
+## Sources
+<https://forge.ipsl.jussieu.fr/nemo/chrome/site/doc/NEMO/guide/html/install.html>

-### Modify configuration
+<https://forge.ipsl.jussieu.fr/ioserver/wiki/documentation>

-Parameter `jp_cfg` controls the resolution.
+<https://nemo-related.readthedocs.io/en/latest/compilation_notes/nemo37.html>

-```
-rm -f time.step solver.stat output.namelist.dyn ocean.output  slurm-*  GYRE_* mesh_mask_00*
-jp_cfg=4
-sed -i -r \
-    -e 's/^( *nn_itend *=).*/\1 21600/' \
-    -e 's/^( *nn_stock *=).*/\1 21600/' \
-    -e 's/^( *nn_write *=).*/\1 1000/' \
-    -e 's/^( *jp_cfg *=).*/\1 '"$jp_cfg"'/' \
-    -e 's/^( *jpidta *=).*/\1 '"$(( 30 * jp_cfg +2))"'/' \
-    -e 's/^( *jpjdta *=).*/\1 '"$(( 20 * jp_cfg +2))"'/' \
-    -e 's/^( *jpiglo *=).*/\1 '"$(( 30 * jp_cfg +2))"'/' \
-    -e 's/^( *jpjglo *=).*/\1 '"$(( 20 * jp_cfg +2))"'/' \
-    namelist_cfg
-
-```
--- a/nemo/architecture_files/NEMO/arch-HAWK.fcm
+++ b/nemo/architecture_files/NEMO/arch-HAWK.fcm
+# generic gfortran compiler options for linux HAWK
+#
+# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
+# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
+# XIOS_HOME   root directory containing lib for XIOS
+# OASIS_HOME  root directory containing lib for OASIS
+#
+# NCDF_INC    netcdf4 include file
+# NCDF_LIB    netcdf4 library
+# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
+# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
+# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
+# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
+#
+# FC          Fortran compiler command
+# FCFLAGS     Fortran compiler flags
+# FFLAGS      Fortran 77 compiler flags
+# LD          linker
+# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
+# FPPFLAGS    pre-processing flags
+# AR          assembler
+# ARFLAGS     assembler flags
+# MK          make
+# USER_INC    complete list of include files
+# USER_LIB    complete list of libraries to pass to the linker
+# CC          C compiler used to compile conv for AGRIF
+# CFLAGS      compiler flags used with CC
+#
+# Note that:
+#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
+#  - fcm variables are starting with a % (and not a $)
+#
+
+%NCDF_HOME           /opt/hlrs/spack/rev-004_2020-06-17/netcdf/4.7.3-gcc-9.2.0-p7eu3czq/
+%NCDF_HOME2          /opt/hlrs/spack/rev-004_2020-06-17/netcdf-fortran/4.5.2-gcc-9.2.0-lxinqb3c/
+%HDF5_HOME           /opt/hlrs/spack/rev-004_2020-06-17/hdf5/1.10.5-gcc-9.2.0-fsds2dq4/
+
+%XIOS_HOME           /zhome/academic/HLRS/pri/iprceayk/data/NEMO/NEMO_F/xios-2.5
+###%XIOS_HOME           /lustre/cray/ws9/6/ws/iprceayk-nemo/xios-2.5/
+%OASIS_HOME          /not/defined
+
+
+%HDF5_LIB            -L%HDF5_HOME/lib -lhdf5_hl -lhdf5
+%GCCLIB              .
+#/opt/hlrs/non-spack/compiler/gcc/9.2.0/lib64/
+
+
+%NCDF_INC            -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include
+%NCDF_LIB            -L%NCDF_HOME/lib %HDF5_LIB -L%NCDF_HOME2/lib -L%GCCLIB -lnetcdff -lnetcdf -lstdc++ -lz -lcurl
+
+%XIOS_INC            -I%XIOS_HOME/inc
+%XIOS_LIB            -L%XIOS_HOME/lib -L%GCCLIB -lxios -lstdc++
+
+%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
+%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
+
+%CPP                 cpp
+%FC                  mpif90 
+%FCFLAGS             -fdefault-real-8 -O3 -funroll-all-loops -fcray-pointer -cpp -ffree-line-length-none
+
+%FFLAGS              %FCFLAGS
+%LD                  %FC
+%LDFLAGS
+%FPPFLAGS            -P -C -traditional 
+
+%AR                  ar
+%ARFLAGS             -rs
+
+%MK                  make
+%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC 
+%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB 
+
+
+
+%CC                  cc
+%CFLAGS              -O0
--- a/nemo/architecture_files/NEMO/arch-Irene.fcm
+++ b/nemo/architecture_files/NEMO/arch-Irene.fcm
+# Curie SKYLAKE at TGCC
+#
+# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
+# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
+# XIOS_HOME   root directory containing lib for XIOS
+# OASIS_HOME  root directory containing lib for OASIS
+#
+# NCDF_INC    netcdf4 include file
+# NCDF_LIB    netcdf4 library
+# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
+# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
+# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
+# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
+#
+# FC          Fortran compiler command
+# FCFLAGS     Fortran compiler flags
+# FFLAGS      Fortran 77 compiler flags
+# LD          linker
+# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
+# FPPFLAGS    pre-processing flags
+# AR          assembler
+# ARFLAGS     assembler flags
+# MK          make
+# USER_INC    complete list of include files
+# USER_LIB    complete list of libraries to pass to the linker
+# CC          C compiler used to compile conv for AGRIF
+# CFLAGS      compiler flags used with CC
+#
+# Note that:
+#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
+#  - fcm variables are starting with a % (and not a $)
+#
+
+%NCDF_HOME           /ccc/products/netcdf-c-4.6.0/intel--20.0.0__openmpi--4.0.1/hdf5__parallel
+%NCDF_HOME2          /ccc/products/netcdf-fortran-4.4.4/intel--20.0.0__openmpi--4.0.1/hdf5__parallel
+%HDF5_HOME           /ccc/products/hdf5-1.8.20/intel--20.0.0__openmpi--4.0.1/parallel
+%XIOS_HOME           /ccc/cont005/home/uniankar/aykanatc/work/NEMO_F/SKY/xios-2.5
+%OASIS_HOME          /not/defined
+#%CURL                .
+
+%HDF5_LIB            -L%HDF5_HOME/lib -L%CURL -lhdf5_hl -lhdf5
+%GCCLIB              .
+
+%NCDF_INC            -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include
+%NCDF_LIB            -L%NCDF_HOME/lib %HDF5_LIB -L%CURL -L%NCDF_HOME2/lib -L%GCCLIB -lnetcdff -lnetcdf -lstdc++ -lz -lcurl
+
+##-lgpfs
+
+%XIOS_INC            -I%XIOS_HOME/inc
+%XIOS_LIB            -L%XIOS_HOME/lib -L%GCCLIB -lxios -lstdc++
+
+%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
+%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
+
+
+%CPP                 icc -E
+%FC                  mpifort
+%FCFLAGS             -O3 -r8 -funroll-all-loops -traceback
+
+%FFLAGS              %FCFLAGS
+%LD                  mpifort
+%LDFLAGS             -lstdc++ -lifcore -O3 -traceback
+%FPPFLAGS            -P -C -traditional
+
+%AR                  ar
+%ARFLAGS             -r
+
+%MK                  make
+%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
+%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB
+
+%CC                  cc
+%CFLAGS              -O0
+
+
+
+##%CPP                 cpp
+##%FC                  mpif90 -c -cpp
+##%FCFLAGS             -i4 -r8 -O3 -fp-model precise -xCORE-AVX512 -fno-alias
+##%FFLAGS              %FCFLAGS
+##%LD                  mpif90
+##%LDFLAGS
+##%FPPFLAGS            -P -traditional
+##%AR                  ar
+##%ARFLAGS             rs
+##%MK                  gmake
+##%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
+##%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB
+
+##%CC                  cc
+##%CFLAGS              -O0
--- a/nemo/architecture_files/NEMO/arch-JUWELS.fcm
+++ b/nemo/architecture_files/NEMO/arch-JUWELS.fcm
+# generic ifort compiler options for JUWELS
+#
+# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
+# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
+# XIOS_HOME   root directory containing lib for XIOS
+# OASIS_HOME  root directory containing lib for OASIS
+#
+# NCDF_INC    netcdf4 include file
+# NCDF_LIB    netcdf4 library
+# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
+# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
+# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
+# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
+#
+# FC          Fortran compiler command
+# FCFLAGS     Fortran compiler flags
+# FFLAGS      Fortran 77 compiler flags
+# LD          linker
+# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
+# FPPFLAGS    pre-processing flags
+# AR          assembler
+# ARFLAGS     assembler flags
+# MK          make
+# USER_INC    complete list of include files
+# USER_LIB    complete list of libraries to pass to the linker
+# CC          C compiler used to compile conv for AGRIF
+# CFLAGS      compiler flags used with CC
+#
+# Note that:
+#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
+#  - fcm variables are starting with a % (and not a $)
+#
+
+%NCDF_HOME           /p/software/juwels/stages/2020/software/netCDF/4.7.4-ipsmpi-2021
+%NCDF_HOME2          /p/software/juwels/stages/2020/software/netCDF-Fortran/4.5.3-ipsmpi-2021
+%HDF5_HOME           /p/software/juwels/stages/2020/software/HDF5/1.10.6-ipsmpi-2021
+%XIOS_HOME           /p/home/jusers/aykanat1/juwels/data/prpb86/NEMO_F/xios-2.5/
+%OASIS_HOME          /not/defined
+%CURL                /p/software/juwels/stages/2020/software/cURL/7.71.1-GCCcore-10.3.0/lib/
+%HDF5_LIB            -L%HDF5_HOME/lib -L%CURL -lhdf5_hl -lhdf5
+%GCCLIB              .
+
+%NCDF_INC            -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include
+%NCDF_LIB            -L%NCDF_HOME/lib %HDF5_LIB -L%CURL -L%NCDF_HOME2/lib -L%GCCLIB -lnetcdff -lnetcdf -lstdc++ -lz -lcurl -lgpfs
+
+%XIOS_INC            -I%XIOS_HOME/inc
+%XIOS_LIB            -L%XIOS_HOME/lib -L%GCCLIB -lxios -lstdc++
+
+%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
+%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
+
+%CPP                 icc -E 
+%FC                  mpifort
+%FCFLAGS             -O3 -r8 -funroll-all-loops -traceback
+ 
+%FFLAGS              %FCFLAGS
+%LD                  mpifort
+%LDFLAGS             -lstdc++ -lifcore -O3 -traceback
+%FPPFLAGS            -P -C -traditional 
+
+%AR                  ar
+%ARFLAGS             -r
+
+%MK                  make
+%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
+%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB
+
+%CC                  cc
+%CFLAGS              -O0
--- a/nemo/architecture_files/NEMO/arch-M100.fcm
+++ b/nemo/architecture_files/NEMO/arch-M100.fcm
+# generic gfortran compiler options for linux M100
+#
+# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
+# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
+# XIOS_HOME   root directory containing lib for XIOS
+# OASIS_HOME  root directory containing lib for OASIS
+#
+# NCDF_INC    netcdf4 include file
+# NCDF_LIB    netcdf4 library
+# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
+# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
+# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
+# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
+#
+# FC          Fortran compiler command
+# FCFLAGS     Fortran compiler flags
+# FFLAGS      Fortran 77 compiler flags
+# LD          linker
+# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
+# FPPFLAGS    pre-processing flags
+# AR          assembler
+# ARFLAGS     assembler flags
+# MK          make
+# USER_INC    complete list of include files
+# USER_LIB    complete list of libraries to pass to the linker
+# CC          C compiler used to compile conv for AGRIF
+# CFLAGS      compiler flags used with CC
+#
+# Note that:
+#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
+#  - fcm variables are starting with a % (and not a $)
+#
+
+#%NCDF_HOME           /cineca/prod/opt/libraries/netcdf/4.7.3--spectrum_mpi--10.4.0/hpc-sdk--2021--binary
+#%NCDF_HOME2          /cineca/prod/opt/libraries/netcdff/4.5.2--spectrum_mpi--10.4.0/hpc-sdk--2021--binary
+#%HDF5_HOME           /cineca/prod/opt/libraries/hdf5/1.12.0--spectrum_mpi--10.3.1/pgi--19.10--binary
+
+#%NCDF_HOME           /cineca/prod/opt/libraries/netcdf/4.7.3/gnu--8.4.0
+#%NCDF_HOME2          /cineca/prod/opt/libraries/netcdff/4.5.2/gnu--8.4.0
+#%HDF5_HOME           /cineca/prod/opt/libraries/hdf5/1.12.0/gnu--8.4.0
+#%XIOS_HOME           /m100/home/userexternal/mkarsavu/data/nemo_test/xios-2.5
+
+
+%NCDF_HOME           /m100_work/PROJECTS/spack/spack-0.14/install/linux-rhel8-power9le/gcc-8.4.0/netcdf-c-4.7.3-gygambvobvqmkmstxe4pf4fjv6mjjc7m
+%NCDF_HOME2          /m100_work/PROJECTS/spack/spack-0.14/install/linux-rhel7-power9le/gcc-8.4.0/netcdf-fortran-4.5.2-tbo5mgy3yxinef4ap7rirsmfzdcvhucf
+%HDF5_HOME           /m100_work/PROJECTS/spack/spack-0.14/install/linux-rhel8-power9le/gcc-8.4.0/hdf5-1.12.0-5a3psyfeiuv6d5hrn4mrgcbxttp6nqze
+%XIOS_HOME           /m100/home/userexternal/mkarsavu/data/NEMO_F/xios-2.5
+
+
+
+%OASIS_HOME          /not/defined
+
+%HDF5_LIB            -L%HDF5_HOME/lib -lhdf5_hl -lhdf5
+%GCCLIB              /cineca/prod/opt/compilers/gnu/8.4.0/none/lib64/
+
+%NCDF_INC            -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include
+%NCDF_LIB            -L%NCDF_HOME/lib %HDF5_LIB -L%NCDF_HOME2/lib -L%GCCLIB -lnetcdff -lnetcdf -lstdc++ -lz -lcurl -lgpfs
+
+%XIOS_INC            -I%XIOS_HOME/inc
+%XIOS_LIB            -L%XIOS_HOME/lib -L%GCCLIB -lxios -lstdc++
+
+%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
+%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
+
+%CPP                 cpp
+%FC                  mpif90
+%FCFLAGS             -fdefault-real-8 -O3 -funroll-all-loops -fcray-pointer -cpp -ffree-line-length-none -fno-second-underscore -Dgfortran
+
+%FFLAGS              %FCFLAGS
+%LD                  %FC
+%LDFLAGS
+%FPPFLAGS            -P -C -traditional -x f77-cpp-input
+
+%AR                  ar
+%ARFLAGS             -rs
+
+%MK                  make
+%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
+%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB
+
+%CC                  cc
+%CFLAGS              -O0
+
--- a/nemo/architecture_files/NEMO/arch-Mare.fcm
+++ b/nemo/architecture_files/NEMO/arch-Mare.fcm
+# generic ifort compiler options for MareNostrum4
+#
+# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
+# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
+# XIOS_HOME   root directory containing lib for XIOS
+# OASIS_HOME  root directory containing lib for OASIS
+#
+# NCDF_INC    netcdf4 include file
+# NCDF_LIB    netcdf4 library
+# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
+# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
+# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
+# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
+#
+# FC          Fortran compiler command
+# FCFLAGS     Fortran compiler flags
+# FFLAGS      Fortran 77 compiler flags
+# LD          linker
+# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
+# FPPFLAGS    pre-processing flags
+# AR          assembler
+# ARFLAGS     assembler flags
+# MK          make
+# USER_INC    complete list of include files
+# USER_LIB    complete list of libraries to pass to the linker
+# CC          C compiler used to compile conv for AGRIF
+# CFLAGS      compiler flags used with CC
+#
+# Note that:
+#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
+#  - fcm variables are starting with a % (and not a $)
+#
+
+%NCDF_HOME           /apps/NETCDF/4.4.1.1/INTEL/IMPI/
+%NCDF_HOME2          /apps/NETCDF/4.4.1.1/INTEL/IMPI/
+%HDF5_HOME           /apps/HDF5/1.8.19/INTEL/IMPI/
+%XIOS_HOME           /home/pr1ena00/pr1ena01/data/NEMO_F/NEMO_F/xios-2.5
+
+
+
+
+%OASIS_HOME          /not/defined
+%CURL                .
+#/gpfs/software/juwels/stages/2019a/software/cURL/7.64.1-GCCcore-8.3.0/lib/
+
+
+%HDF5_LIB            -L%HDF5_HOME/lib -L%CURL -lhdf5_hl -lhdf5
+%GCCLIB                         .
+
+%NCDF_INC            -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include
+%NCDF_LIB            -L%NCDF_HOME/lib %HDF5_LIB -L%CURL -L%NCDF_HOME2/lib -L%GCCLIB -lnetcdff -lnetcdf -lstdc++ -lz -lcurl -lgpfs
+
+%XIOS_INC            -I%XIOS_HOME/inc
+%XIOS_LIB            -L%XIOS_HOME/lib -L%GCCLIB -lxios -lstdc++
+
+%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
+%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
+
+%CPP                 icc -E -xCORE-AVX512 -mtune=skylake
+%FC                  mpiifort
+%FCFLAGS             -O3 -r8 -funroll-all-loops -traceback
+
+%FFLAGS              %FCFLAGS
+%LD                  %FC
+%LDFLAGS             -lstdc++ -lifcore -O3 -traceback
+%FPPFLAGS            -P -C -traditional
+
+%AR                  ar
+%ARFLAGS             -r
+
+%MK                  make
+%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
+%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB
+
+
+
+%CC                  cc
+%CFLAGS              -O0
+
--- a/nemo/architecture_files/NEMO/arch-SuperMUC.fcm
+++ b/nemo/architecture_files/NEMO/arch-SuperMUC.fcm
+# generic ifort compiler options for SuperMUC
+#
+# NCDF_HOME   root directory containing lib and include subdirectories for netcdf4
+# HDF5_HOME   root directory containing lib and include subdirectories for HDF5
+# XIOS_HOME   root directory containing lib for XIOS
+# OASIS_HOME  root directory containing lib for OASIS
+#
+# NCDF_INC    netcdf4 include file
+# NCDF_LIB    netcdf4 library
+# XIOS_INC    xios include file    (taken into accound only if key_iomput is activated)
+# XIOS_LIB    xios library         (taken into accound only if key_iomput is activated)
+# OASIS_INC   oasis include file   (taken into accound only if key_oasis3 is activated)
+# OASIS_LIB   oasis library        (taken into accound only if key_oasis3 is activated)
+#
+# FC          Fortran compiler command
+# FCFLAGS     Fortran compiler flags
+# FFLAGS      Fortran 77 compiler flags
+# LD          linker
+# LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries
+# FPPFLAGS    pre-processing flags
+# AR          assembler
+# ARFLAGS     assembler flags
+# MK          make
+# USER_INC    complete list of include files
+# USER_LIB    complete list of libraries to pass to the linker
+# CC          C compiler used to compile conv for AGRIF
+# CFLAGS      compiler flags used with CC
+#
+# Note that:
+#  - unix variables "$..." are accpeted and will be evaluated before calling fcm.
+#  - fcm variables are starting with a % (and not a $)
+#
+
+%NCDF_HOME           /dss/dsshome1/lrz/sys/spack/release/21.1.1/opt/skylake_avx512/netcdf-hdf5-all/4.7_hdf5-1.10-intel-vd6s5so
+%NCDF_HOME2          /dss/dsshome1/lrz/sys/spack/release/21.1.1/opt/skylake_avx512/netcdf-hdf5-all/4.7_hdf5-1.10-intel-vd6s5so
+%HDF5_HOME           /dss/dsshome1/lrz/sys/spack/release/21.1.1/opt/skylake_avx512/netcdf-hdf5-all/4.7_hdf5-1.10-intel-vd6s5so
+%XIOS_HOME           /dss/dsshome1/03/di67wat/data/NEMO_F/NEMO_F/xios-2.5/
+%OASIS_HOME          /not/defined
+%CURL                /dss/dsshome1/lrz/sys/spack/release/21.1.1/opt/x86_64/curl/7.68.0-gcc-b2wrnof/lib/
+
+
+%HDF5_LIB            -L%HDF5_HOME/lib -L%CURL -lhdf5_hl -lhdf5
+%GCCLIB              .
+
+
+%NCDF_INC            -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include
+%NCDF_LIB            -L%NCDF_HOME/lib %HDF5_LIB -L%CURL -L%NCDF_HOME2/lib -L%GCCLIB -lnetcdff -lnetcdf -lstdc++ -lz -lcurl -lgpfs
+%XIOS_INC            -I%XIOS_HOME/inc
+%XIOS_LIB            -L%XIOS_HOME/lib -L%GCCLIB -lxios -lstdc++
+
+%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
+%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
+
+
+%CPP                 icc -E
+%FC                  mpiifort
+%FCFLAGS             -O3 -r8 -funroll-all-loops -traceback
+
+
+
+%FFLAGS              %FCFLAGS
+%LD                  mpiifort
+%LDFLAGS			-lstdc++ -lifcore -O3 -traceback
+
+
+%FPPFLAGS            -P -C -traditional 
+
+%AR                  ar
+%ARFLAGS             -r
+
+%MK                  make
+%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC 
+%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB 
+
+
+
+%CC                  cc
+%CFLAGS              -O0
+
--- a/nemo/architecture_files/XIOS/arch-HAWK.env
+++ b/nemo/architecture_files/XIOS/arch-HAWK.env
+module load hdf5
+module load netcdf
+module load netcdf-fortran
--- a/nemo/architecture_files/XIOS/arch-HAWK.fcm
+++ b/nemo/architecture_files/XIOS/arch-HAWK.fcm
+################################################################################
+###################                Projet XIOS               ###################
+################################################################################
+
+%CCOMPILER      mpicc
+%FCOMPILER      mpif90
+%LINKER         mpif90
+
+%BASE_CFLAGS    -ansi -w
+%PROD_CFLAGS    -O3 -DBOOST_DISABLE_ASSERTS
+%DEV_CFLAGS     -g -O2 
+%DEBUG_CFLAGS   -g 
+
+%BASE_FFLAGS    -D__NONE__  -ffree-line-length-none
+%PROD_FFLAGS    -O3
+%DEV_FFLAGS     -g -O2
+%DEBUG_FFLAGS   -g 
+
+%BASE_INC       -D__NONE__
+%BASE_LD        -lstdc++
+
+%CPP            cpp
+%FPP            cpp -P
+%MAKE           make
+
--- a/nemo/architecture_files/XIOS/arch-HAWK.path
+++ b/nemo/architecture_files/XIOS/arch-HAWK.path
+NETCDF_INCDIR="-I$NETCDF_INC_DIR -I$NETCDFF_INC_DIR"
+NETCDF_LIBDIR="-Wl,--allow-multiple-definition -L$NETCDF_LIB_DIR -L$NETCDFF_LIB_DIR"
+NETCDF_LIB="-lnetcdff -lnetcdf"
+
+MPI_INCDIR=""
+MPI_LIBDIR=""
+MPI_LIB=""
+
+HDF5_INCDIR="-I $HDF5_INC_DIR"
+HDF5_LIBDIR="-L $HDF5_LIB_DIR"
+HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl"
+
+BOOST_INCDIR="-I $BOOST_INC_DIR"
+BOOST_LIBDIR="-L $BOOST_LIB_DIR"
+BOOST_LIB=""
+
+OASIS_INCDIR="-I$PWD/../../oasis3-mct/BLD/build/lib/psmile.MPI1"
+OASIS_LIBDIR="-L$PWD/../../oasis3-mct/BLD/lib"
+OASIS_LIB="-lpsmile.MPI1 -lscrip -lmct -lmpeu"
+
+
--- a/nemo/architecture_files/XIOS/arch-JUWELS.env
+++ b/nemo/architecture_files/XIOS/arch-JUWELS.env
+module load GCC
+##module load PGI/19.10-GCC-8.3.0
+module load Intel
+module load ParaStationMPI
+module load HDF5
+module load netCDF
+module load netCDF-Fortran
+module load cURL
+module load Perl
+
--- a/nemo/architecture_files/XIOS/arch-JUWELS.fcm
+++ b/nemo/architecture_files/XIOS/arch-JUWELS.fcm
+################################################################################
+###################                Projet XIOS               ###################
+################################################################################
+
+%CCOMPILER      mpicc
+%FCOMPILER      mpif90
+%LINKER         mpif90  -nofor-main
+
+%BASE_CFLAGS    -ansi -w
+%PROD_CFLAGS    -O3 -DBOOST_DISABLE_ASSERTS
+%DEV_CFLAGS     -g -O2 
+%DEBUG_CFLAGS   -g 
+
+%BASE_FFLAGS    -D__NONE__  -ffree-line-length-none
+%PROD_FFLAGS    -O3
+%DEV_FFLAGS     -g -O2
+%DEBUG_FFLAGS   -g 
+
+%BASE_INC       -D__NONE__
+%BASE_LD        -lstdc++
+
+%CPP            cpp
+%FPP            cpp -P
+%MAKE           make
--- a/nemo/architecture_files/XIOS/arch-JUWELS.path
+++ b/nemo/architecture_files/XIOS/arch-JUWELS.path
+NETCDF_INCDIR="-I$NETCDF_INC_DIR -I$NETCDFF_INC_DIR"
+NETCDF_LIBDIR="-Wl,'--allow-multiple-definition' -L$NETCDF_LIB_DIR -L$NETCDFF_LIB_DIR"
+NETCDF_LIB="-lnetcdff -lnetcdf"
+
+MPI_INCDIR=""
+MPI_LIBDIR=""
+MPI_LIB=""
+
+HDF5_INCDIR="-I $HDF5_INC_DIR"
+HDF5_LIBDIR="-L $HDF5_LIB_DIR"
+HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl"
+
+BOOST_INCDIR="-I $BOOST_INC_DIR"
+BOOST_LIBDIR="-L $BOOST_LIB_DIR"
+BOOST_LIB=""
+
+OASIS_INCDIR="-I$PWD/../../oasis3-mct/BLD/build/lib/psmile.MPI1"
+OASIS_LIBDIR="-L$PWD/../../oasis3-mct/BLD/lib"
+OASIS_LIB="-lpsmile.MPI1 -lscrip -lmct -lmpeu"
+
--- a/nemo/architecture_files/XIOS/arch-M100.env
+++ b/nemo/architecture_files/XIOS/arch-M100.env
+module load gnu
+module load szip
+module load zlib
+module load spectrum_mpi/10.3.1--binary
+module load hdf5/1.12.0--spectrum_mpi--10.3.1--binary
+module load netcdf/4.7.3--spectrum_mpi--10.3.1--binary
+module load netcdff/4.5.2--spectrum_mpi--10.3.1--binary
+export NETCDF_INC_DIR=/cineca/prod/opt/libraries/netcdf/4.7.3--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/include/
+export NETCDFF_INC_DIR=/cineca/prod/opt/libraries/netcdff/4.5.2--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/include/
+export NETCDF_LIB_DIR=/cineca/prod/opt/libraries/netcdf/4.7.3--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/lib/
+export NETCDFF_LIB_DIR=/cineca/prod/opt/libraries/netcdff/4.5.2--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/lib/
+export HDF5_INC_DIR=/cineca/prod/opt/libraries/hdf5/1.12.0--spectrum_mpi--10.3.1/pgi--19.10--binary/include/
+export HDF5_LIB_DIR=/cineca/prod/opt/libraries/hdf5/1.12.0--spectrum_mpi--10.3.1/pgi--19.10--binary/lib/
--- a/nemo/architecture_files/XIOS/arch-M100.fcm
+++ b/nemo/architecture_files/XIOS/arch-M100.fcm
+################################################################################
+###################                Projet XIOS               ###################
+################################################################################
+
+%CCOMPILER      mpicc
+%FCOMPILER      mpif90
+%LINKER         mpif90  
+
+%BASE_CFLAGS    -ansi -w 
+%PROD_CFLAGS    -O3 -DBOOST_DISABLE_ASSERTS
+%DEV_CFLAGS     -g -O2  -traceback
+%DEBUG_CFLAGS   -DBZ_DEBUG -g -traceback -fno-inline
+
+%BASE_FFLAGS    -D__NONE__ -ffree-line-length-none
+%PROD_FFLAGS    -O3
+%DEV_FFLAGS     -g -O2 -traceback
+%DEBUG_FFLAGS   -g -traceback
+
+%BASE_INC       -D __NONE__
+%BASE_LD        -lstdc++
+
+%CPP            cpp
+%FPP            cpp -P
+%MAKE           make
--- a/nemo/architecture_files/XIOS/arch-M100.path
+++ b/nemo/architecture_files/XIOS/arch-M100.path
+NETCDF_INCDIR="-I$NETCDF_INC_DIR -I$NETCDFF_INC_DIR"
+NETCDF_LIBDIR="-Wl,'--allow-multiple-definition' -L$NETCDF_LIB_DIR -L$NETCDFF_LIB_DIR"
+NETCDF_LIB="-lnetcdff -lnetcdf"
+
+MPI_INCDIR=""
+MPI_LIBDIR=""
+MPI_LIB=""
+
+HDF5_INCDIR="-I $HDF5_INC_DIR"
+HDF5_LIBDIR="-L $HDF5_LIB_DIR"
+HDF5_LIB="-lhdf5_hl -lhdf5  -lz -lcurl"
+
+BOOST_INCDIR="-I $BOOST_INC_DIR"
+BOOST_LIBDIR="-L $BOOST_LIB_DIR"
+BOOST_LIB=""
+
+OASIS_INCDIR="-I$PWD/../../oasis3-mct/BLD/build/lib/psmile.MPI1"
+OASIS_LIBDIR="-L$PWD/../../oasis3-mct/BLD/lib"
+OASIS_LIB="-lpsmile.MPI1 -lscrip -lmct -lmpeu"
--- a/nemo/architecture_files/XIOS/arch-Mare.env
+++ b/nemo/architecture_files/XIOS/arch-Mare.env
+module load perl
+module load hdf5
+module load netcdf
--- a/nemo/architecture_files/XIOS/arch-Mare.fcm
+++ b/nemo/architecture_files/XIOS/arch-Mare.fcm
+################################################################################
+###################                Projet XIOS               ###################
+################################################################################
+
+%CCOMPILER      mpicc
+%FCOMPILER      mpif90
+%LINKER         mpif90 -nofor-main
+
+%BASE_CFLAGS    -ansi -w  -xCORE-AVX512 -mtune=skylake
+%PROD_CFLAGS    -O3 -DBOOST_DISABLE_ASSERTS
+%DEV_CFLAGS     -g -O2
+%DEBUG_CFLAGS   -g
+
+%BASE_FFLAGS    -D__NONE__ -ffree-line-length-none
+%PROD_FFLAGS    -O3
+%DEV_FFLAGS     -g -O2
+%DEBUG_FFLAGS   -g
+
+%BASE_INC       -D__NONE__
+%BASE_LD        -lstdc++
+
+%CPP            cpp
+%FPP            cpp -P
+%MAKE           make