Skip to content
README_ACC.md 4.65 KiB
Newer Older
Vic's avatar
Vic committed
README file for PRACE Accelerator Benchmark Code PFARM (stage EXDIG, program RMX95)
Vic's avatar
Vic committed
===================================================================================
Vic's avatar
Vic committed
Author: Andrew Sunderland (a.g.sunderland@stfc.ac.uk).

Victor's avatar
Victor committed
The [code download](https://www.dropbox.com/sh/dlcpzr934r0wazy/AABlphkgEn9tgRlwHY2k3lqBa?dl=0
) should contain the following directories:

Vic's avatar
Vic committed

```
Vic's avatar
Vic committed
benchmark/RMX_HOST: RMX source files for running on Host or KNL (using LAPACK or MKL)
benchmark/RMX_MAGMA_GPU: RMX source for running on GPUs using MAGMA
benchmark/lib: 
benchmark/run: run directory with input files
benchmark/xdr: XDR library src files
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

The code uses the eXternal Data Representation library (XDR) for cross-platform
compatibility of unformatted data files. The XDR source files are provided with this code bundle.
Vic's avatar
Vic committed
It can be obtained from various sources, including
Vic's avatar
Vic committed
http://people.redhat.com/rjones/portablexdr/

Vic's avatar
Vic committed

Compilation
***********
Installing MAGMA (GPU Only)
---------------------------
Vic's avatar
Vic committed
Download MAGMA (current version magma-2.2.0)  from http://icl.utk.edu/magma/
Install MAGMA : Modify the make.inc file to indicate your C/C++
Vic's avatar
Vic committed
compiler, Fortran compiler, and where CUDA, CPU BLAS, and 
LAPACK are installed on your system. Refer to MAGMA documentation for further details

Install XDR
-----------
Vic's avatar
Vic committed
build XDR library: 
update DEFS file for your compiler and environment
Vic's avatar
Vic committed

```shell
Vic's avatar
Vic committed
$> make
Vic's avatar
Vic committed
```

Install RMX_HOST
----------------
Vic's avatar
Vic committed
Update DEFS file for your setup, ensuring you are linking to a LAPACK or MKL library.
Vic's avatar
Vic committed
This is usually facilitated by e.g. compiling with `-mkl=parallel` (Intel compiler) or loading the appropriate library modules. 

```shell
Vic's avatar
Vic committed
$> cd RMX_HOST
$> make
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

Vic's avatar
Vic committed
Install RMX_MAGMA_GPU 
---------------------
Vic's avatar
Vic committed
Update DEFS file for your setup:
Vic's avatar
Vic committed
 - Set MAGMADIR, CUDADIR and OPENBLASDIR environment variables
 - Updating the fortran compiler and flags in DEFS file.

```shell
Vic's avatar
Vic committed
$> cd RMX_MAGMA_GPU
$> make
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

Vic's avatar
Vic committed

Run instructions
****************

Vic's avatar
Vic committed
Run RMX
Vic's avatar
Vic committed
-------
Vic's avatar
Vic committed

Vic's avatar
Vic committed
The RMX application can be run by running the executable `rmx95`
Vic's avatar
Vic committed
For the FEIII dataset, the program requires the following input files to reside in the same directory as the executable:
Vic's avatar
Vic committed

```
Vic's avatar
Vic committed
phzin.ctl
XJTARMOM
HXJ030
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

Vic's avatar
Vic committed
These files are located in `benchmark/run`
Vic's avatar
Vic committed
A guide to each of the variables in the namelist in phzin.ctl can be found at:
https://hpcforge.org/plugins/mediawiki/wiki/pfarm/images/9/99/Phz_rep.pdf
However, it is recommended that these inputs are not changed for the benchmark code and
problem size, runtime etc, are controlled via the environment variables listed below.

A typical PBS script to run the RMX_HOST benchmark on 4 KNL nodes (4 MPI tasks with 64 threads per MPI task) is listed below:
Settings will vary according to your local environment.

Vic's avatar
Vic committed
```shell
Vic's avatar
Vic committed
#PBS -N rmx95_4x64
#PBS -l select=4
#PBS -l walltime=01:00:00
#PBS -A my_account_id

cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=64

aprun -N 1 -n 4 -d $OMP_NUM_THREADS ./rmx95
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

Vic's avatar
Vic committed
Run-time environment variable settings
--------------------------------------
Vic's avatar
Vic committed
The following environmental variables that e.g. can be set inside the script allow the H sector matrix 
to easily change dimensions and also allows the number of sectors to change when undertaking benchmarks.
These can be adapted by the user to suit benchmark load requirements e.g. short vs long runs.
Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU).
The distribution among MPI tasks is simple round-robin.
 
Vic's avatar
Vic committed
 - `RMX_NGPU` : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU)
 - `RMX_NSECT_FINE` : sets the number of sectors for the Fine region. 
 - `RMX_NSECT_COARSE` : sets the number of sectors for the Coarse region. 
 - `RMX_NL_FINE` : sets the number of basis functions for the Fine region sector calculations. 
 - `RMX_NL_COARSE` : sets the number of basis functions for the Coarse region sector calculations. 
Vic's avatar
Vic committed

Vic's avatar
Vic committed
**Notes**:
Vic's avatar
Vic committed
For a representative setup for the benchmark datasets:

Vic's avatar
Vic committed
 - `RMX_NL_FINE`  can take values in the range 6:25
 - `RMX_NL_COARSE`  can take values in the range 5:10 
 - For accuracy reasons, `RMX_NL_FINE` should always be great than `RMX_NL_COARSE`. 
 - The following value pairs for `RMX_NL_FINE` and `RMX_NL_COARSE` provide representative calculations:
Vic's avatar
Vic committed

Vic's avatar
Vic committed
```
Vic's avatar
Vic committed
12,6
14,8
16,10
18,10
20,10
25,10
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

Vic's avatar
Vic committed
If `RMX_NSECT` and `RMX_NL` variables are not set, the benchmark code defaults to:

```
Vic's avatar
Vic committed
RMX_NSECT_FINE=5
RMX_NSECT_COARSE=20
RMX_NL_FINE=12
RMX_NL_COARSE=6
Vic's avatar
Vic committed
```
Vic's avatar
Vic committed

The Hamiltonian matrix dimension will be output along 
with the Wallclock time it takes to do each individual DSYEVD call.

Performance is measured in Wallclock time and is displayed 
on the screen or output log at the end of the run.