@@ -121,12 +130,12 @@ users wish to experiment with settings there is a guide here.
The following environmental variables that e.g. can be set inside the script allow the H sector matrix
to easily change dimensions and also allows the number of sectors to change when undertaking benchmarks.
These can be adapted by the user to suit benchmark load requirements e.g. short vs long runs.
Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU).
Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU). The maximum number of MPI tasks for a region calculation should not exceed the number of sectors specified. There is no limit for threads, though for efficint performance on current hardware, it would be recommended to set between 16 to 64 threads per MPI tasks.
The distribution among MPI tasks is simple round-robin.
RMX_NGPU : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU)
RMX_NSECT_FINE : sets the number of sectors for the Fine region (it is recommended to set this to a low number if the sector Hamiltonian matrix dimension is large).
RMX_NSECT_COARSE : sets the number of sectors for the Coarse region (it is recommended to set this to a low number if the sector Hamiltonian matrix dimension is large).
RMX_NSECT_FINE : sets the number of sectors for the Fine region (e.g. 16 for smaller runs, 256 for larger-scale runs). The molecular case is limited to a maximum of 512 sectors for this benchmark.
RMX_NSECT_COARSE : sets the number of sectors for the Coarse region (e.g. 16 for smaller runs, 256 for larger-scale runs). The molecular case is limited to a maximum of 512 sectors for this benchmark.
RMX_NL_FINE : sets the number of basis functions for the Fine region sector calculations (this will determine the size of the sector Hamiltonian matrix).
RMX_NL_COARSE : sets the number of basis functions for the Coarse region sector calculations (this will determine the size of the sector Hamiltonian matrix).
Hint: To aid scaling across nodes, the number of MPI tasks in the job script should ideally be a factor of RMX_NSECT_FINE.
...
...
@@ -169,7 +178,7 @@ on the screen or output log at the end of the run.
** Validation of Results
For the atomic dataset runs, from the results directory issue the command
For the atomic dataset runs, run the atomic problem configuration supplied in the 'example_job_scripts' directory . From the results directory issue the command:
@@ -182,7 +191,7 @@ Mesh 1, Sector 16: first five eigenvalues = -4329.7161 -4170.9100 -415
Mesh 2, Sector 16: first five eigenvalues = -313.6307 -301.0096 -298.8824 -293.3929 -290.6190
Mesh 2, Sector 16: final five eigenvalues = 290.6190 293.3929 298.8824 301.0102 313.6307
For the molecular dataset runs, from the results directory issue the command
For the molecular dataset runs, run the molecular problem configuration supplied in 'example_job_scripts' directory. From the results directory issue the command:
@@ -10,7 +10,7 @@ In this README we give information relevant for its use in the UEABS.
The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
### GPU version
Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations.
Accelerator-based GPU versions of the code using the MAGMA library for eigensolver calculations.
@@ -14,7 +14,7 @@ In this README we give information relevant for its use in the UEABS.
The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels.
### GPU version
Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations.
Accelerator-based GPU versions of the code using the MAGMA library for eigensolver calculations.