diff --git a/pfarm/PFARM_Build_Run_README.txt b/pfarm/PFARM_Build_Run_README.txt index c2f09c5d42cc76a24f0f6a96fc84039c6e68f18a..2dd9f876eb8f798a240c037bfec527ff84571aea 100644 --- a/pfarm/PFARM_Build_Run_README.txt +++ b/pfarm/PFARM_Build_Run_README.txt @@ -99,13 +99,16 @@ located in data/test_case_2_mol: phzin.ctl H - -A guide to each of the variables in the namelist in phzin.ctl can be found at: -https://hpcforge.org/plugins/mediawiki/wiki/pfarm/images/9/99/Phz_rep.pdf -However, it is recommended that these inputs are not changed for the benchmark runs and +It is recommended that the settings in the input file phzin.ctl are not changed for the benchmark runs and problem size, runtime etc, are better controlled via the environment variables listed below. -Example job scripts for cpu / gpu / atomic and molecular cases are provided in the directories +To setup run directories with the correct executables and datafiles, bash script files are provided: +cpu/setup_run_cpu_atom.scr +cpu/setup_run_cpu_mol.scr +gpu/setup_run_gpu_atom.scr +gpu/setup_run_gpu_mol.scr + +Example submission job scripts for cpu / gpu / atomic and molecular cases are provided in the directories cpu/example_job_scripts gpu/example_job_scripts @@ -162,7 +165,21 @@ The Hamiltonian matrix dimension will be output along with the Wallclock time it takes to do each individual DSYEVD (eigensolver) call. Performance is measured in Wallclock time and is displayed -on the screen or output log at the end of the run. +on the screen or output log at the end of the run. + +For the atomic dataset, grep the output file for 'Sector 16:' +The output should match the values below. + + Mesh 1, Sector 16: first five eigenvalues = -4329.72 -4170.91 -4157.31 -4100.98 -4082.11 + Mesh 1, Sector 16: final five eigenvalues = 4100.98 4157.31 4170.91 4329.72 4370.54 + Mesh 2, Sector 16: first five eigenvalues = -313.631 -301.010 -298.882 -293.393 -290.619 + Mesh 2, Sector 16: final five eigenvalues = 290.619 293.393 298.882 301.010 313.631 + +For the molecular dataset, `grep` the output file for `'Sector 64:'` +The output should match the values below. + + Mesh 1, Sector 64: first five eigenvalues = -3850.84 -3593.98 -3483.83 -3466.73 -3465.72 + Mesh 1, Sector 64: final five eigenvalues = 3465.72 3466.73 3483.83 3593.99 3850.84 ---------------------------------------------------------------------------- diff --git a/pfarm/README.md b/pfarm/README.md index 0c4511f9472344c00ad6977ad60181480603f3c4..94c3009b27ee6fcaebeb39e64b218d41997ad8cb 100644 --- a/pfarm/README.md +++ b/pfarm/README.md @@ -7,10 +7,12 @@ PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio appro In this README we give information relevant for its use in the UEABS. ### Standard CPU version -The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels. +The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels. ### GPU version -Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time. +Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations. +### Configure, Build and Run Instructions +See PFARM_Build_Run_README.txt diff --git a/pfarm/README_ACC.md b/pfarm/README_ACC.md index 9dd8b32732605cadfb133d33dfdc3a88d24171cd..8aecf506fee351a5756f2885cbf0daef7df47feb 100644 --- a/pfarm/README_ACC.md +++ b/pfarm/README_ACC.md @@ -2,143 +2,21 @@ README file for PRACE Accelerator Benchmark Code PFARM (stage EXDIG, program RMX =================================================================================== Author: Andrew Sunderland (a.g.sunderland@stfc.ac.uk). -The [code download](https://www.dropbox.com/sh/dlcpzr934r0wazy/AABlphkgEn9tgRlwHY2k3lqBa?dl=0 -) should contain the following directories: +# PFARM in the United European Applications Benchmark Suite (UEABS) +## Document Author: Andrew Sunderland (andrew.sunderland@stfc.ac.uk) , STFC, UK. -``` -benchmark/RMX_HOST: RMX source files for running on Host or KNL (using LAPACK or MKL) -benchmark/RMX_MAGMA_GPU: RMX source for running on GPUs using MAGMA -benchmark/lib: -benchmark/run: run directory with input files -benchmark/xdr: XDR library src files -``` +## Introduction +PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the vari-tional solution of the many-electron Schrödinger equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical applications (such as: the interstellar medium, planetary atmospheres) with, for example, var-ious ions of Fe and Ni and neutral O, plus other applications such as data for plasma model-ling and fusion reactor impurities. The code has recently been adapted to form a compatible interface with the UKRmol suite of codes for electron (positron) molecule collisions thus ena-bling large-scale parallel ‘outer-region’ calculations for molecular systems as well as atomic systems. +In this README we give information relevant for its use in the UEABS. -The code uses the eXternal Data Representation library (XDR) for cross-platform -compatibility of unformatted data files. The XDR source files are provided with this code bundle. -It can be obtained from various sources, including -http://people.redhat.com/rjones/portablexdr/ +### Standard CPU version +The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels. +### GPU version +Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations. -Compilation -*********** -Installing MAGMA (GPU Only) ---------------------------- -Download MAGMA (current version magma-2.2.0) from http://icl.utk.edu/magma/ -Install MAGMA : Modify the make.inc file to indicate your C/C++ -compiler, Fortran compiler, and where CUDA, CPU BLAS, and -LAPACK are installed on your system. Refer to MAGMA documentation for further details +### Configure, Build and Run Instructions +See PFARM_Build_Run_README.txt -Install XDR ------------ -build XDR library: -update DEFS file for your compiler and environment - -```shell -$> make -``` - -Install RMX_HOST ----------------- -Update DEFS file for your setup, ensuring you are linking to a LAPACK or MKL library. -This is usually facilitated by e.g. compiling with `-mkl=parallel` (Intel compiler) or loading the appropriate library modules. - -```shell -$> cd RMX_HOST -$> make -``` - -Install RMX_MAGMA_GPU ---------------------- -Update DEFS file for your setup: - - Set MAGMADIR, CUDADIR and OPENBLASDIR environment variables - - Updating the fortran compiler and flags in DEFS file. - -```shell -$> cd RMX_MAGMA_GPU -$> make -``` - - -Run instructions -**************** - -Run RMX -------- - -The RMX application can be run by running the executable `rmx95` -For the FEIII dataset, the program requires the following input files to reside in the same directory as the executable: - -``` -phzin.ctl -XJTARMOM -HXJ030 -``` - -These files are located in `benchmark/run` -A guide to each of the variables in the namelist in phzin.ctl can be found at: -https://hpcforge.org/plugins/mediawiki/wiki/pfarm/images/9/99/Phz_rep.pdf -However, it is recommended that these inputs are not changed for the benchmark code and -problem size, runtime etc, are controlled via the environment variables listed below. - -A typical PBS script to run the RMX_HOST benchmark on 4 KNL nodes (4 MPI tasks with 64 threads per MPI task) is listed below: -Settings will vary according to your local environment. - -```shell -#PBS -N rmx95_4x64 -#PBS -l select=4 -#PBS -l walltime=01:00:00 -#PBS -A my_account_id - -cd $PBS_O_WORKDIR -export OMP_NUM_THREADS=64 - -aprun -N 1 -n 4 -d $OMP_NUM_THREADS ./rmx95 -``` - -Run-time environment variable settings --------------------------------------- -The following environmental variables that e.g. can be set inside the script allow the H sector matrix -to easily change dimensions and also allows the number of sectors to change when undertaking benchmarks. -These can be adapted by the user to suit benchmark load requirements e.g. short vs long runs. -Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU). -The distribution among MPI tasks is simple round-robin. - - - `RMX_NGPU` : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU) - - `RMX_NSECT_FINE` : sets the number of sectors for the Fine region. - - `RMX_NSECT_COARSE` : sets the number of sectors for the Coarse region. - - `RMX_NL_FINE` : sets the number of basis functions for the Fine region sector calculations. - - `RMX_NL_COARSE` : sets the number of basis functions for the Coarse region sector calculations. - -**Notes**: -For a representative setup for the benchmark datasets: - - - `RMX_NL_FINE` can take values in the range 6:25 - - `RMX_NL_COARSE` can take values in the range 5:10 - - For accuracy reasons, `RMX_NL_FINE` should always be great than `RMX_NL_COARSE`. - - The following value pairs for `RMX_NL_FINE` and `RMX_NL_COARSE` provide representative calculations: - -``` -12,6 -14,8 -16,10 -18,10 -20,10 -25,10 -``` - -If `RMX_NSECT` and `RMX_NL` variables are not set, the benchmark code defaults to: - -``` -RMX_NSECT_FINE=5 -RMX_NSECT_COARSE=20 -RMX_NL_FINE=12 -RMX_NL_COARSE=6 -``` - -The Hamiltonian matrix dimension will be output along -with the Wallclock time it takes to do each individual DSYEVD call. - -Performance is measured in Wallclock time and is displayed -on the screen or output log at the end of the run.