Update README.md

421327a9 · Vic · 579ce6cd · 421327a9
Commit 421327a9 authored 8 years ago by Vic
--- a/pfarm/README.md
+++ b/pfarm/README.md
-========================================================================
 README file for PRACE Accelerator Benchmark Code PFARM (stage EXDIG, program RMX95)
-========================================================================
+===================================================================================
 Author: Andrew Sunderland (a.g.sunderland@stfc.ac.uk).

 The code download should contain the following directories:
+
+```
 benchmark/RMX_HOST: RMX source files for running on Host or KNL (using LAPACK or MKL)
 benchmark/RMX_MAGMA_GPU: RMX source for running on GPUs using MAGMA
 benchmark/lib: 
 benchmark/run: run directory with input files
 benchmark/xdr: XDR library src files
+```

 The code uses the eXternal Data Representation library (XDR) for cross-platform
 compatibility of unformatted data files. The XDR source files are provided with this code bundle.
-and can be obtained from various sources, including
+It can be obtained from various sources, including
 http://people.redhat.com/rjones/portablexdr/

----------------------------------------------------------------------------
-* Installing (MAGMA GPU Only)
+
+Compilation
+***********
+Installing MAGMA (GPU Only)
+---------------------------
 Download MAGMA (current version magma-2.2.0)  from http://icl.utk.edu/magma/
 Install MAGMA : Modify the make.inc file to indicate your C/C++
- compiler, Fortran compiler, and where CUDA, CPU BLAS, and 
- LAPACK are installed on your system. Refer to MAGMA documentation for further details
----------------------------------------------------------------------------
-* Install XDR
+compiler, Fortran compiler, and where CUDA, CPU BLAS, and 
+LAPACK are installed on your system. Refer to MAGMA documentation for further details
+
+Install XDR
+-----------
 build XDR library: 
 update DEFS file for your compiler and environment
+
+```shell
 $> make
----------------------------------------------------------------------------
-* Install RMX_HOST
+```
+
+Install RMX_HOST
+----------------
 Update DEFS file for your setup, ensuring you are linking to a LAPACK or MKL library.
-This is usually facilitated by e.g. compiling with -mkl=parallel (Intel compiler) or loading the appropriate library modules. 
+This is usually facilitated by e.g. compiling with `-mkl=parallel` (Intel compiler) or loading the appropriate library modules. 
+
+```shell
 $> cd RMX_HOST
 $> make
+```

-* Install RMX_MAGMA_GPU 
+Install RMX_MAGMA_GPU 
+---------------------
 Update DEFS file for your setup:
+ - Set MAGMADIR, CUDADIR and OPENBLASDIR environment variables
+ - Updating the fortran compiler and flags in DEFS file.
+
+```shell
 $> cd RMX_MAGMA_GPU
 $> make
+```

-set MAGMADIR, (CUDADIR , OPENBLASDIR ) environment variables
-Build RMX application after updating the fortran compiler and 
- flags in DEFS file.
----------------------------------------------------------------------------
-* Run RMX
+Run RMX
+*******

-The RMX application can be run by running the executable "rmx95"
+The RMX application can be run by running the executable `rmx95`
 For the FEIII dataset, the program requires the following input files to reside in the same directory as the executable:
+
+```
 phzin.ctl
 XJTARMOM
 HXJ030
+```

-These files are located in benchmark/run
+These files are located in `benchmark/run`
 A guide to each of the variables in the namelist in phzin.ctl can be found at:
 https://hpcforge.org/plugins/mediawiki/wiki/pfarm/images/9/99/Phz_rep.pdf
 However, it is recommended that these inputs are not changed for the benchmark code and
@@ -59,6 +78,7 @@ problem size, runtime etc, are controlled via the environment variables listed b
 A typical PBS script to run the RMX_HOST benchmark on 4 KNL nodes (4 MPI tasks with 64 threads per MPI task) is listed below:
 Settings will vary according to your local environment.

+```shell
 #PBS -N rmx95_4x64
 #PBS -l select=4
 #PBS -l walltime=01:00:00
@@ -68,8 +88,7 @@ cd $PBS_O_WORKDIR
 export OMP_NUM_THREADS=64

 aprun -N 1 -n 4 -d $OMP_NUM_THREADS ./rmx95
-
----------------------------------------------------------------------------
+```

 * Run-time environment variable settings

@@ -79,32 +98,37 @@ These can be adapted by the user to suit benchmark load requirements e.g. short
 Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU).
 The distribution among MPI tasks is simple round-robin.
 
-RMX_NGPU : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU)
-RMX_NSECT_FINE : sets the number of sectors for the Fine region. 
-RMX_NSECT_COARSE : sets the number of sectors for the Coarse region. 
-RMX_NL_FINE : sets the number of basis functions for the Fine region sector calculations. 
-RMX_NL_COARSE : sets the number of basis functions for the Coarse region sector calculations. 
+ - `RMX_NGPU` : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU)
+ - `RMX_NSECT_FINE` : sets the number of sectors for the Fine region. 
+ - `RMX_NSECT_COARSE` : sets the number of sectors for the Coarse region. 
+ - `RMX_NL_FINE` : sets the number of basis functions for the Fine region sector calculations. 
+ - `RMX_NL_COARSE` : sets the number of basis functions for the Coarse region sector calculations. 

-Notes:
+**Notes**:
 For a representative setup for the benchmark datasets:

-RMX_NL_FINE  can take values in the range 6:25
-RMX_NL_COARSE  can take values in the range 5:10 
-For accuracy reasons, RMX_NL_FINE should always be great than RMX_NL_COARSE. 
-The following value pairs for RMX_NL_FINE and RMX_NL_COARSE provide representative calculations:
+ - `RMX_NL_FINE`  can take values in the range 6:25
+ - `RMX_NL_COARSE`  can take values in the range 5:10 
+ - For accuracy reasons, `RMX_NL_FINE` should always be great than `RMX_NL_COARSE`. 
+ - The following value pairs for `RMX_NL_FINE` and `RMX_NL_COARSE` provide representative calculations:

+```
 12,6
 14,8
 16,10
 18,10
 20,10
 25,10
+```

-If RMX_NSECT and RMX_NL variables are not set, the benchmark code defaults to:
+If `RMX_NSECT` and `RMX_NL` variables are not set, the benchmark code defaults to:
+
+```
 RMX_NSECT_FINE=5
 RMX_NSECT_COARSE=20
 RMX_NL_FINE=12
 RMX_NL_COARSE=6
+```

 The Hamiltonian matrix dimension will be output along 
 with the Wallclock time it takes to do each individual DSYEVD call.
@@ -112,5 +136,3 @@ with the Wallclock time it takes to do each individual DSYEVD call.
 Performance is measured in Wallclock time and is displayed 
 on the screen or output log at the end of the run. 

-
----------------------------------------------------------------------------
\ No newline at end of file