Skip to content
Snippets Groups Projects
Commit b9a8c7fd authored by Andrew Sunderland's avatar Andrew Sunderland
Browse files

New README files final 6iP Benchmarking

parent e611ede0
Branches
Tags
No related merge requests found
......@@ -43,6 +43,12 @@ $> make
(ignore warnings related to float/double type mismatches in xdr_rmat64.c - this is not relevant for this benchmark)
The validity of the XDR library can be tested by running test_xdr
$> ./test_xdr
rpc headers may not be available for XDR on the target platform, leading to compilation errors of the type:
cannot open source file "rpc/rpc.h"
#include <rpc/rpc.h>
For this case use the make include file DEFS_Intel_rpc
* Install CPU version (MPI and OpenMP)
$> cd cpu
......@@ -56,11 +62,13 @@ $> make
** To install the molecular version of the code
$> cd src_mpi_omp_mol
$> make
The -ltirpc option for 'STATIC_LIBS' in 'DEFS' should only be included when the XDR library has used 'DEFS_Intel_rpc'.
* Install GPU version (MPI / OpenMP / MAGMA / CUDA )
Set MAGMADIR, CUDADIR environment variables to point to MAGMA and CUDA installations.
Set the MAGMADIR, CUDADIR environment variables to point to MAGMA and CUDA installations.
The numerical library MAGMA may be provided through the modules system of the platform.
Please check target platform user guides for linking instructions.
$> load module magma
If unavailable via a module, then MAGMA may need to be installed (see below)
$> cd gpu
......@@ -74,6 +82,7 @@ $> make
** To install the molecular version of the code
$> cd src_mpi_gpu_mol
$> make
The -ltirpc option for 'STATIC_LIBS' in 'DEFS' should only be included when the XDR library has used 'DEFS_Intel_rpc'.
----------------------------------------------------------------------------
* Installing (MAGMA for GPU Only)
......@@ -121,12 +130,12 @@ users wish to experiment with settings there is a guide here.
The following environmental variables that e.g. can be set inside the script allow the H sector matrix
to easily change dimensions and also allows the number of sectors to change when undertaking benchmarks.
These can be adapted by the user to suit benchmark load requirements e.g. short vs long runs.
Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU).
Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU). The maximum number of MPI tasks for a region calculation should not exceed the number of sectors specified. There is no limit for threads, though for efficint performance on current hardware, it would be recommended to set between 16 to 64 threads per MPI tasks.
The distribution among MPI tasks is simple round-robin.
RMX_NGPU : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU)
RMX_NSECT_FINE : sets the number of sectors for the Fine region (it is recommended to set this to a low number if the sector Hamiltonian matrix dimension is large).
RMX_NSECT_COARSE : sets the number of sectors for the Coarse region (it is recommended to set this to a low number if the sector Hamiltonian matrix dimension is large).
RMX_NSECT_FINE : sets the number of sectors for the Fine region (e.g. 16 for smaller runs, 256 for larger-scale runs). The molecular case is limited to a maximum of 512 sectors for this benchmark.
RMX_NSECT_COARSE : sets the number of sectors for the Coarse region (e.g. 16 for smaller runs, 256 for larger-scale runs). The molecular case is limited to a maximum of 512 sectors for this benchmark.
RMX_NL_FINE : sets the number of basis functions for the Fine region sector calculations (this will determine the size of the sector Hamiltonian matrix).
RMX_NL_COARSE : sets the number of basis functions for the Coarse region sector calculations (this will determine the size of the sector Hamiltonian matrix).
Hint: To aid scaling across nodes, the number of MPI tasks in the job script should ideally be a factor of RMX_NSECT_FINE.
......@@ -169,7 +178,7 @@ on the screen or output log at the end of the run.
** Validation of Results
For the atomic dataset runs, from the results directory issue the command
For the atomic dataset runs, run the atomic problem configuration supplied in the 'example_job_scripts' directory . From the results directory issue the command:
awk '/Sector 16/ && /eigenvalues/' <stdout.filename>
......@@ -182,7 +191,7 @@ Mesh 1, Sector 16: first five eigenvalues = -4329.7161 -4170.9100 -415
Mesh 2, Sector 16: first five eigenvalues = -313.6307 -301.0096 -298.8824 -293.3929 -290.6190
Mesh 2, Sector 16: final five eigenvalues = 290.6190 293.3929 298.8824 301.0102 313.6307
For the molecular dataset runs, from the results directory issue the command
For the molecular dataset runs, run the molecular problem configuration supplied in 'example_job_scripts' directory. From the results directory issue the command:
awk '/Sector 64/ && /eigenvalues/' <stdout.filename>
......
......@@ -10,7 +10,7 @@ In this README we give information relevant for its use in the UEABS.
The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
### GPU version
Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations.
Accelerator-based GPU versions of the code using the MAGMA library for eigensolver calculations.
### Configure, Build and Run Instructions
See PFARM_Build_Run_README.txt
......
......@@ -14,7 +14,7 @@ In this README we give information relevant for its use in the UEABS.
The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
### GPU version
Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations.
Accelerator-based GPU versions of the code using the MAGMA library for eigensolver calculations.
### Configure, Build and Run Instructions
See PFARM_Build_Run_README.txt
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment