# Specfem 3D globe -- Bench readme

## Summary Version

1.0

## General description
The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). All SPECFEM3D_GLOBE software is written in Fortran90 with full portability in mind, and conforms strictly to the Fortran95 standard. It uses no obsolete or obsolescent features of Fortran77. The package uses parallel programming based upon the Message Passing Interface (MPI).
The SEM was originally developed in computational fluid dynamics and has been successfully adapted to address problems in seismic wave propagation. It is a continuous Galerkin technique, which can easily be made discontinuous; it is then close to a particular case of the discontinuous Galerkin technique, with optimized efficiency because of its tensorized basis functions. In particular, it can accurately handle very distorted mesh elements. It has very good accuracy and convergence properties. The spectral element approach admits spectral rates of convergence and allows exploiting hp-convergence schemes. It is also very well suited to parallel implementation on very large supercomputers as well as on clusters of GPU accelerating graphics cards. Tensor products inside each element can be optimized to reach very high efficiency, and mesh point and element numbering can be optimized to reduce processor cache misses and improve cache reuse. The SEM can also handle triangular (in 2D) or tetrahedral (3D) elements as well as mixed meshes, although with increased cost and reduced accuracy in these elements, as in the discontinuous Galerkin method.
In many geological models in the context of seismic wave propagation studies (except for instance for fault dynamic rupture studies, in which very high frequencies of supershear rupture need to be modeled near the fault, a continuous formulation is sufficient because material property contrasts are not drastic and thus conforming mesh doubling bricks can efficiently handle mesh size variations. This is particularly true at the scale of the full Earth. Effects due to lateral variations in compressional-wave speed, shear-wave speed, density, a 3D crustal model, ellipticity, topography and bathyletry, the oceans, rotation, and self-gravitation are included. The package can accommodate full 21-parameter anisotropy as well as lateral variations in attenuation. Adjoint capabilities and finite-frequency kernel simulations are also included.

* Web site: http://geodynamics.org/cig/software/specfem3d_globe/
* User manual: https://geodynamics.org/cig/software/specfem3d_globe/gitbranch/devel/doc/USER_MANUAL/manual_SPECFEM3D_GLOBE.pdf
* Code download: https://github.com/geodynamics/specfem3d_globe.git
* Build instructions: https://github.com/geodynamics/specfem3d/wiki/02_getting_started
* Test Cases
    * Test Case A: https://repository.prace-ri.eu/git/UEABS/ueabs/tree/master/specfem3d/test_cases/SPECFEM3D_TestCaseA
    * Test Case B: https://repository.prace-ri.eu/git/UEABS/ueabs/tree/master/specfem3d/test_cases/SPECFEM3D_TestCaseB
    * Test Case C: https://repository.prace-ri.eu/git/UEABS/ueabs/tree/master/specfem3d/test_cases/SPECFEM3D_TestCaseC
* Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/specfem3d/SPECFEM3D_Run_README.txt

## Purpose of Benchmark
The software package SPECFEM3D_GLOBE simulates three-dimensional global and regional seismic wave propagation and performs full waveform imaging (FWI) or adjoint tomography based upon the spectral-element method (SEM).
The test cases simulate the earthquake of June 1994 in Northern Bolivia at a global scale with the global shear-wave speed model named s362ani.
Test Case A is designed to run on Tier-1 sized systems (up to around 1,000 x86 cores, or equivalent), Test Case B is designed to run on Tier-0 sized systems (up to around 10,000 x86 cores, or equivalent) and finally the test case C is designed to run on PCP prototypes (up to around 100 cores, or equivalent).


## Mechanics of Building Benchmark

### Get the source

Clone the repository in a location of your choice.
Download Specfem3D_Globe software package :
```shell
git clone https://github.com/geodynamics/specfem3d_globe.git
```
Then use a fixed and stable version of specfem3D_globe (the one of October 31, 2017 
for example, see https://github.com/geodynamics/specfem3d_globe/commits/master)
```shell
cd specfem3d_globe
git checkout b1d6ba966496f269611eff8c2cf1f22bcdac2bd9
```
If this is not done, clone the ueabs repository. 
```shell
git clone https://repository.prace-ri.eu/git/UEABS/ueabs.git
```
In the specfem3D folder of this repo, you will find test cases in the test_cases folder, 
you will also find environment and submission scripts templates for several machines

### Define the environment

**a.** You will need a Fortran and a C compiler and a MPI library and it is recommended that you explicitly specify the appropriate command names for your Fortran compiler in your .bashrc or your .cshrc file (or directly in your submission file). To be exhaustive here are the relevant variables to compile the code: 

 - `LANG=C`
 - `FC`
 - `MPIFC`
 - `CC`
 - `MPICC`

**b.** To be able to run on GPUs, you must define the CUDA environment by setting the following two variables: 
 - `CUDA_LIB`
 - `CUDA_INC`

An exemple (compiling for GPUs) on the ouessant cluster at IDRIS - France:

```shell
LANG=C

module purge
module load pgi cuda ompi

export FC=`which pgfortran`
export MPIFC=`which mpif90`
export CC=`which pgcc`
export MPICC=`which mpicc`
export CUDA_LIB="$CUDAROOT/lib64"
export CUDA_INC="$CUDAROOT/include"
```
You will find in the specfem3D folder of this repo a folder named env,
with files named env_x which gives examples of the environment used on several supercomputers 
during the last benchmark campaign

**c.** To define the optimization specific to the target architecture, you will need the environment variables FCFLAGS and CFLAGS.

### Configuration step
To configure specfem3D_Globe use the configure script, this script assumes that 
you will compile the code on the same kind of hardware as the machine on which 
you will run it. **As arrays are staticaly declared, you will need to compile specfem 
once for each test case with the right `Par_file`** which is the parameter file of specfem3D.

To use the **shared memory parallel programming** model of specfem3D we will 
specify `--enable-openmp` configure option.

**On GPU platform** you will need to add the following arguments to the configure 
`--build=ppc64 --with-cuda=cuda5` and you will need to set the `GPU_MODE = .true.`
in the parameter file `Par_file`

On some environement, depending on MPI configuration you will need to replace
`use mpi` statement with `include mpif.h`, use the script and prodedure commented below.

```shell
### replace `use mpi` if needed ###
# cd utils
# perl replace_use_mpi_with_include_mpif_dot_h.pl
# cd ..
####################################

./configure --prefix=$PWD
```

**On Xeon Phi**, since support is recent you should replace the following variables
values in the generated Makefile:

```Makefile
FCFLAGS = -g -O3 -qopenmp -xMIC-AVX512 -DUSE_FP32 -DOPT_STREAMS -align array64byte  -fp-model fast=2 -traceback -mcmodel=large
FCFLAGS_f90 = -mod ./obj -I./obj -I.  -I. -I${SETUP} -xMIC-AVX512
CPPFLAGS = -I${SETUP}  -DFORCE_VECTORIZATION  -xMIC-AVX512
```
Note: Be careful, in most machines login node does not have the same instruction set so, in order to compile with the right instruction set, you'll have to compile on a compute node (salloc + ssh)


### Compilation
Finally compile with make:
```shell
make clean
make all
```

**-> You will find in the specfem folder of ueabs repository the file "compile.sh" which is an compilation script template for several machines (different architectures : KNL, SKL, Haswell and GPU)**

## Mechanics of running
Input for the mesher (and the solver) is provided through the parameter file Par_file, which resides in the subdirectory DATA. Before running the mesher, a number of parameters need to be set in the Par_file. The solver calculates seismograms for 129 stations, and simulations are run for a record length of 3 minutes 30 for test case A, 10 minutes for test case B and one minute for test case C.
The different test cases correspond to different meshes of the earth. The size of the mesh is determined by a combination of following variables: NCHUNKS, the number of chunks in the cubed sphere (6 for global simulations), NPROC_XI, the number of processors or slices along one chunk of the cubed sphere and NEX_XI, the number of spectral elements along one side of a chunk in the cubed sphere. These three variables give us the number of degrees of freedom of the mesh and determine the amount of memory needed per core. The Specfem3D solver must be recompiled each time we change the mesh size because the solver uses a static loop size and the compilers know the size of all loops only at the time of compilation and can therefore optimize them efficiently.
 - Test case A runs with `96 MPI` tasks using hybrid parallelization (MPI+OpenMP or MPI+OpenMP+Cuda depending on the system tested) and has the following mesh characteristics: NCHUNKS=6, NPROC_XI=4 and NEX_XI=384.
 - Test Case B runs with `1536 MPI` tasks using hybrid parallelization and has the following mesh characteristics: NCHUNKS=6, NPROC_XI=16 and NEX_XI=384. 
 - Test Case C runs with `6 MPI` tasks using hybrid parallelization and has the following mesh characteristics:  NCHUNKS=6, NPROC_XI=1 and NEX_XI=64.

Once the parameter file is correctly defined, to run the test cases, copy the `Par_file`, 
`STATIONS` and `CMTSOLUTION` files defining one of the three test cases (A, B or 
C cf https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/specfem3d/test_cases) 
into the SPECFEM3D_GLOBE/DATA directory. Then use the xmeshfem3D binary (located 
in the bin directory) to mesh the domain and xspecfem3D to solve the problem using 
the appropriate command to run parallel jobs (srun, ccc_mprun, mpirun…).

```shell
srun   bin/xmeshfem3D
srun   bin/xspecfem3D
```

You can use or be inspired by the submission script template in the job_script folder using the appropriate job submission command :
- qsub for pbs job,
- sbatch for slurm job,
- ccc_msub for irene job (wrapper),
- llsubmit for LoadLeveler job.


## Verification of results
The relevant metric for this benchmark is time for the solver. Using slurm, it is 
easy to gather as each `mpirun` or `srun` is interpreted as a step which is already 
timed. So the command line `sacct -j <job_id>` allows you to catch the metric. 
The output of the mesher (“output_mesher.txt”) and of the solver (“output_solver.txt”) 
can be find in the OUTPUT_FILES directory. These files contains physical values 
and timing values that are more accurate than those collected by slurm.