diff --git a/pfarm/PFARM_Build_Run_README.txt b/pfarm/PFARM_Build_Run_README.txt index 8b2863f3f9a933acb874ffbfa1fff98d788117b0..1867a58f299c79ae72ab58db02c214bc993f822f 100644 --- a/pfarm/PFARM_Build_Run_README.txt +++ b/pfarm/PFARM_Build_Run_README.txt @@ -43,6 +43,12 @@ $> make (ignore warnings related to float/double type mismatches in xdr_rmat64.c - this is not relevant for this benchmark) The validity of the XDR library can be tested by running test_xdr $> ./test_xdr +rpc headers may not be available for XDR on the target platform, leading to compilation errors of the type: +cannot open source file "rpc/rpc.h" + #include <rpc/rpc.h> + +For this case use the make include file DEFS_Intel_rpc + * Install CPU version (MPI and OpenMP) $> cd cpu @@ -56,11 +62,13 @@ $> make ** To install the molecular version of the code $> cd src_mpi_omp_mol $> make - +The -ltirpc option for 'STATIC_LIBS' in 'DEFS' should only be included when the XDR library has used 'DEFS_Intel_rpc'. * Install GPU version (MPI / OpenMP / MAGMA / CUDA ) -Set MAGMADIR, CUDADIR environment variables to point to MAGMA and CUDA installations. +Set the MAGMADIR, CUDADIR environment variables to point to MAGMA and CUDA installations. The numerical library MAGMA may be provided through the modules system of the platform. +Please check target platform user guides for linking instructions. + $> load module magma If unavailable via a module, then MAGMA may need to be installed (see below) $> cd gpu @@ -74,6 +82,7 @@ $> make ** To install the molecular version of the code $> cd src_mpi_gpu_mol $> make +The -ltirpc option for 'STATIC_LIBS' in 'DEFS' should only be included when the XDR library has used 'DEFS_Intel_rpc'. ---------------------------------------------------------------------------- * Installing (MAGMA for GPU Only) @@ -121,12 +130,12 @@ users wish to experiment with settings there is a guide here. The following environmental variables that e.g. can be set inside the script allow the H sector matrix to easily change dimensions and also allows the number of sectors to change when undertaking benchmarks. These can be adapted by the user to suit benchmark load requirements e.g. short vs long runs. -Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU). +Each MPI Task will pickup a sector calculation which will then be distributed amongst available threads per node (for CPU and KNL) or offloaded (for GPU). The maximum number of MPI tasks for a region calculation should not exceed the number of sectors specified. There is no limit for threads, though for efficint performance on current hardware, it would be recommended to set between 16 to 64 threads per MPI tasks. The distribution among MPI tasks is simple round-robin. RMX_NGPU : refers to the number of shared GPUs per node (only for RMX_MAGMA_GPU) -RMX_NSECT_FINE : sets the number of sectors for the Fine region (it is recommended to set this to a low number if the sector Hamiltonian matrix dimension is large). -RMX_NSECT_COARSE : sets the number of sectors for the Coarse region (it is recommended to set this to a low number if the sector Hamiltonian matrix dimension is large). +RMX_NSECT_FINE : sets the number of sectors for the Fine region (e.g. 16 for smaller runs, 256 for larger-scale runs). The molecular case is limited to a maximum of 512 sectors for this benchmark. +RMX_NSECT_COARSE : sets the number of sectors for the Coarse region (e.g. 16 for smaller runs, 256 for larger-scale runs). The molecular case is limited to a maximum of 512 sectors for this benchmark. RMX_NL_FINE : sets the number of basis functions for the Fine region sector calculations (this will determine the size of the sector Hamiltonian matrix). RMX_NL_COARSE : sets the number of basis functions for the Coarse region sector calculations (this will determine the size of the sector Hamiltonian matrix). Hint: To aid scaling across nodes, the number of MPI tasks in the job script should ideally be a factor of RMX_NSECT_FINE. @@ -169,7 +178,7 @@ on the screen or output log at the end of the run. ** Validation of Results -For the atomic dataset runs, from the results directory issue the command +For the atomic dataset runs, run the atomic problem configuration supplied in the 'example_job_scripts' directory . From the results directory issue the command: awk '/Sector 16/ && /eigenvalues/' <stdout.filename> @@ -182,7 +191,7 @@ Mesh 1, Sector 16: first five eigenvalues = -4329.7161 -4170.9100 -415 Mesh 2, Sector 16: first five eigenvalues = -313.6307 -301.0096 -298.8824 -293.3929 -290.6190 Mesh 2, Sector 16: final five eigenvalues = 290.6190 293.3929 298.8824 301.0102 313.6307 -For the molecular dataset runs, from the results directory issue the command +For the molecular dataset runs, run the molecular problem configuration supplied in 'example_job_scripts' directory. From the results directory issue the command: awk '/Sector 64/ && /eigenvalues/' <stdout.filename> diff --git a/pfarm/README.md b/pfarm/README.md index 94c3009b27ee6fcaebeb39e64b218d41997ad8cb..5ed276417dc7522dc5a5a4230cbad8f65a2eb3ea 100644 --- a/pfarm/README.md +++ b/pfarm/README.md @@ -10,7 +10,7 @@ In this README we give information relevant for its use in the UEABS. The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels. ### GPU version -Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations. +Accelerator-based GPU versions of the code using the MAGMA library for eigensolver calculations. ### Configure, Build and Run Instructions See PFARM_Build_Run_README.txt diff --git a/pfarm/README_ACC.md b/pfarm/README_ACC.md index 8aecf506fee351a5756f2885cbf0daef7df47feb..4cb7b4f589e7b5a4dbb6222d9083a17d9a72809c 100644 --- a/pfarm/README_ACC.md +++ b/pfarm/README_ACC.md @@ -14,7 +14,7 @@ In this README we give information relevant for its use in the UEABS. The PFARM outer-region application code EXDIG is dominated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via shared memory enabled numerical library kernels. ### GPU version -Accelerator-based Nvidia GPU versions of the code using the MAGMA library for eigensolver calculations. +Accelerator-based GPU versions of the code using the MAGMA library for eigensolver calculations. ### Configure, Build and Run Instructions See PFARM_Build_Run_README.txt