Skip to content
Snippets Groups Projects
Commit 622d7a98 authored by Andrew Sunderland's avatar Andrew Sunderland
Browse files

Updated README files

parent 9dc25ef0
Branches
Tags
No related merge requests found
......@@ -99,13 +99,16 @@ located in data/test_case_2_mol:
phzin.ctl
H
A guide to each of the variables in the namelist in phzin.ctl can be found at:
https://hpcforge.org/plugins/mediawiki/wiki/pfarm/images/9/99/Phz_rep.pdf
However, it is recommended that these inputs are not changed for the benchmark runs and
It is recommended that the settings in the input file phzin.ctl are not changed for the benchmark runs and
problem size, runtime etc, are better controlled via the environment variables listed below.
Example job scripts for cpu / gpu / atomic and molecular cases are provided in the directories
To setup run directories with the correct executables and datafiles, bash script files are provided:
cpu/setup_run_cpu_atom.scr
cpu/setup_run_cpu_mol.scr
gpu/setup_run_gpu_atom.scr
gpu/setup_run_gpu_mol.scr
Example submission job scripts for cpu / gpu / atomic and molecular cases are provided in the directories
cpu/example_job_scripts
gpu/example_job_scripts
......@@ -162,7 +165,21 @@ The Hamiltonian matrix dimension will be output along
with the Wallclock time it takes to do each individual DSYEVD (eigensolver) call.
Performance is measured in Wallclock time and is displayed
on the screen or output log at the end of the run.
on the screen or output log at the end of the run.
For the atomic dataset, grep the output file for 'Sector 16:'
The output should match the values below.
Mesh 1, Sector 16: first five eigenvalues = -4329.72 -4170.91 -4157.31 -4100.98 -4082.11
Mesh 1, Sector 16: final five eigenvalues = 4100.98 4157.31 4170.91 4329.72 4370.54
Mesh 2, Sector 16: first five eigenvalues = -313.631 -301.010 -298.882 -293.393 -290.619
Mesh 2, Sector 16: final five eigenvalues = 290.619 293.393 298.882 301.010 313.631
For the molecular dataset, `grep` the output file for `'Sector 64:'`
The output should match the values below.
Mesh 1, Sector 64: first five eigenvalues = -3850.84 -3593.98 -3483.83 -3466.73 -3465.72
Mesh 1, Sector 64: final five eigenvalues = 3465.72 3466.73 3483.83 3593.99 3850.84
----------------------------------------------------------------------------
......@@ -7,10 +7,12 @@ PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio appro
In this README we give information relevant for its use in the UEABS.
### Standard CPU version
The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
### GPU version
Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time.
Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations.
### Configure, Build and Run Instructions
See PFARM_Build_Run_README.txt
......@@ -2,143 +2,21 @@ README file for PRACE Accelerator Benchmark Code PFARM (stage EXDIG, program RMX
===================================================================================
Author: Andrew Sunderland (a.g.sunderland@stfc.ac.uk).
The [code download](https://www.dropbox.com/sh/dlcpzr934r0wazy/AABlphkgEn9tgRlwHY2k3lqBa?dl=0
) should contain the following directories:
# PFARM in the United European Applications Benchmark Suite (UEABS)
## Document Author: Andrew Sunderland (andrew.sunderland@stfc.ac.uk) , STFC, UK.
```
benchmark/RMX_HOST: RMX source files for running on Host or KNL (using LAPACK or MKL)
benchmark/RMX_MAGMA_GPU: RMX source for running on GPUs using MAGMA
benchmark/lib:
benchmark/run: run directory with input files
benchmark/xdr: XDR library src files
```
## Introduction
PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the vari-tional solution of the many-electron Schrödinger equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical applications (such as: the interstellar medium, planetary atmospheres) with, for example, var-ious ions of Fe and Ni and neutral O, plus other applications such as data for plasma model-ling and fusion reactor impurities. The code has recently been adapted to form a compatible interface with the UKRmol suite of codes for electron (positron) molecule collisions thus ena-bling large-scale parallel ‘outer-region’ calculations for molecular systems as well as atomic systems.
In this README we give information relevant for its use in the UEABS.
The code uses the eXternal Data Representation library (XDR) for cross-platform
compatibility of unformatted data files. The XDR source files are provided with this code bundle.
It can be obtained from various sources, including
http://people.redhat.com/rjones/portablexdr/
### Standard CPU version
The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
### GPU version
Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations.
Compilation
***********
Installing MAGMA (GPU Only)
---------------------------
Download MAGMA (current version magma-2.2.0) from http://icl.utk.edu/magma/
Install MAGMA : Modify the make.inc file to indicate your C/C++
compiler, Fortran compiler, and where CUDA, CPU BLAS, and
LAPACK are installed on your system. Refer to MAGMA documentation for further details
### Configure, Build and Run Instructions
See PFARM_Build_Run_README.txt
Install XDR
-----------
build XDR library:
update DEFS file for your compiler and environment
```shell
$> make
```
Install RMX_HOST
----------------
Update DEFS file for your setup, ensuring you are linking to a LAPACK or MKL library.
This is usually facilitated by e.g. compiling with `-mkl=parallel` (Intel compiler) or loading the appropriate library modules.
```shell
$> cd RMX_HOST
$> make
```
Install RMX_MAGMA_GPU
---------------------
Update DEFS file for your setup:
- Set MAGMADIR, CUDADIR and OPENBLASDIR environment variables
- Updating the fortran compiler and flags in DEFS file.
```shell
$> cd RMX_MAGMA_GPU
$> make
```
Run instructions
****************
Run RMX
-------
The RMX application can be run by running the executable `rmx95`
For the FEIII dataset, the program requires the following input files to reside in the same directory as the executable:
```
phzin.ctl
XJTARMOM
HXJ030
```
These files are located in `benchmark/run`
A guide to each of the variables in the namelist in phzin.ctl can be found at:
https://hpcforge.org/plugins/mediawiki/wiki/pfarm/images/9/99/Phz_rep.pdf
However, it is recommended that these inputs are not changed for the benchmark code and
problem size, runtime etc, are controlled via the environment variables listed below.
A typical PBS script to run the RMX_HOST benchmark on 4 KNL nodes (4 MPI tasks with 64 threads per MPI task) is listed below:
Settings will vary according to your local environment.
```shell
#PBS -N rmx95_4x64
#PBS -l select=4
#PBS -l walltime=01:00:00
#PBS -A my_account_id
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=64
aprun -N 1 -n 4 -d $OMP_NUM_THREADS ./rmx95
```
Run-time environment variable settings
--------------------------------------
The following environmental variables that e.g. can be set inside the script allow the H sector matrix
to easily change dimensions and also allows the number of sectors to change when undertaking benchmarks.
These can be adapted by the user to suit benchmark load requirements e.g. short vs long runs.
Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU).
The distribution among MPI tasks is simple round-robin.
- `RMX_NGPU` : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU)
- `RMX_NSECT_FINE` : sets the number of sectors for the Fine region.
- `RMX_NSECT_COARSE` : sets the number of sectors for the Coarse region.
- `RMX_NL_FINE` : sets the number of basis functions for the Fine region sector calculations.
- `RMX_NL_COARSE` : sets the number of basis functions for the Coarse region sector calculations.
**Notes**:
For a representative setup for the benchmark datasets:
- `RMX_NL_FINE` can take values in the range 6:25
- `RMX_NL_COARSE` can take values in the range 5:10
- For accuracy reasons, `RMX_NL_FINE` should always be great than `RMX_NL_COARSE`.
- The following value pairs for `RMX_NL_FINE` and `RMX_NL_COARSE` provide representative calculations:
```
12,6
14,8
16,10
18,10
20,10
25,10
```
If `RMX_NSECT` and `RMX_NL` variables are not set, the benchmark code defaults to:
```
RMX_NSECT_FINE=5
RMX_NSECT_COARSE=20
RMX_NL_FINE=12
RMX_NL_COARSE=6
```
The Hamiltonian matrix dimension will be output along
with the Wallclock time it takes to do each individual DSYEVD call.
Performance is measured in Wallclock time and is displayed
on the screen or output log at the end of the run.
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment