Commit 738caf83 authored by Holly Judge's avatar Holly Judge
Browse files

Update README.md for v7.1

parent 75a85d56
......@@ -7,71 +7,112 @@
## Purpose of Benchmark
CP2K is a freely available quantum chemistry and solid-state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems.
CP2K is a freely available quantum chemistry and solid-state physics software
package that can perform atomistic simulations of solid state, liquid,
molecular, periodic, material, crystal, and biological systems.
## Characteristics of Benchmark
CP2K can be used to perform DFT calculations using the QuickStep algorithm. This applies mixed Gaussian and plane waves approaches (such as GPW and GAPW). Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.
CP2K is written in Fortran 2008 and can be run in parallel using a combination of multi-threading, MPI, and CUDA. All of CP2K is MPI parallelised, with some additional loops also being OpenMP parallelised. It is therefore most important to take advantage of MPI parallelisation, however running one MPI rank per CPU core often leads to memory shortage. At this point OpenMP threads can be used to utilise all CPU cores without suffering an overly large memory footprint. The optimal ratio between MPI ranks and OpenMP threads depends on the type of simulation and the system in question. CP2K supports CUDA, allowing it to offload some linear algebra operations including sparse matrix multiplications to the GPU through its DBCSR acceleration layer. FFTs can optionally also be offloaded to the GPU. Benefits of GPU offloading may yield improved performance depending on the type of simulation and the system in question.
CP2K can be used to perform DFT calculations using the QuickStep algorithm. This
applies mixed Gaussian and plane waves approaches (such as GPW and GAPW).
Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods
(AM1, PM3, PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …).
CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo,
Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy
minimisation, and transition state optimisation using NEB or dimer method.
CP2K is written in Fortran 2008 and can be run in parallel using a combination
of multi-threading, MPI, and CUDA. All of CP2K is MPI parallelised, with some
additional loops also being OpenMP parallelised. It is therefore most important
to take advantage of MPI parallelisation, however running one MPI rank per CPU
core often leads to memory shortage. At this point OpenMP threads can be used
to utilise all CPU cores without suffering an overly large memory footprint.
The optimal ratio between MPI ranks and OpenMP threads depends on the type of
simulation and the system in question. CP2K supports CUDA, allowing it to
offload some linear algebra operations including sparse matrix multiplications
to the GPU through its DBCSR acceleration layer. FFTs can optionally also be
offloaded to the GPU. Benefits of GPU offloading may yield improved performance
depending on the type of simulation and the system in question.
## Mechanics of Building Benchmark
GNU make and Python 2.x are required for the build process, as are a Fortran 2003 compiler and matching C compiler, e.g. gcc/gfortran (gcc >=4.6 works, later version is recommended).
GNU make and Python 2.x are required for the build process, as are a Fortran
2003 compiler and matching C compiler, e.g. gcc/gfortran (gcc >=4.6 works, later
version is recommended).
CP2K can benefit from a number of external libraries for improved performance.
It is advised to use vendor-optimized versions of these libraries. If these are
not available on your machine, there exist freely available implementations of
these libraries.
CP2K can benefit from a number of external libraries for improved performance. It is advised to use vendor-optimized versions of these libraries. If these are not available on your machine, there exist freely available implementations of these libraries including but not limited to those listed below.
Overview of build process:
1. Install Libint.
2. Install Libxc.
3. Install FFTW library (or use MKL's FFTW3 interface).
4. Check if LAPACK, BLAS, SCALAPACK and BLACS are provided and install if not.
5. Install optional libraries - ELPA, libxsmm, libgrid.
6. Build CP2K and link to Libint, Libxc, FFTW, LAPACK, BLAS, SCALAPACK and BLACS, and to relevant CUDA libraries if building for GPU.
<!--- (CP2K is built using a Fortran 2003 compiler and matching C compiler such as gfortran/gcc (version 4.6 and above) and compiled with GNU make. CP2K makes use of a variety of different libraries. Some are essential for running in parallel and others may be used to increase the performance. The steps to build CP2K are as follows:) -->
### Download the source code
Download a CP2K release from https://sourceforge.net/projects/cp2k/files/ or follow instructions at https://www.cp2k.org/download to check out the relevant branch of the CP2K GitHub repository.
wget https://github.com/cp2k/cp2k/releases/download/v7.1.0/cp2k-7.1.tar.bz2
bunzip2 cp2k-7.1.tar.bz2
tar xvf cp2k-7.1.tar
cd cp2k-7.1
### Install or locate required libraries
**LAPACK & BLAS**
### Install or locate libraries
Can be provided from:
**LIBINT**
netlib : http://netlib.org/lapack & http://netlib.org/blas
MKL : part of the Intel MKL installation
LibSci : installed on Cray platforms
ATLAS : http://math-atlas.sf.net
OpenBLAS : http://www.openblas.net
clBLAS : http://gpuopen.com/compute-product/clblas/
The following commands will uncompress and install the LIBINT library required for the UEABS benchmarks:
**SCALAPACK and BLACS**
wget https://github.com/cp2k/libint-cp2k/releases/download/v2.6.0/libint-v2.6.0-cp2k-lmax-4.tgz
tar zxvf libint-v2.6.0-cp2k-lmax-4.tgz
cd libint-v2.6.0-cp2k-lmax-4
./configure CC=cc CXX=CC FC=ftn --enable-fortran --prefix=install_path : must not be this directory
make
make install
Can be provided from:
Note: The environment variables ``CC`` and ``CXX`` are optional and can be used to specify the C and C++ compilers to use for the build (the example above is configured to use the compiler wrappers ``cc`` and ``CC`` used on Cray systems).
netlib : http://netlib.org/scalapack/
MKL : part of the Intel MKL installation
LibSci : installed on Cray platforms
**LIBXC**
**LIBINT**
Libxc - v4.0.3 or later : http://www.tddft.org/programs/octopus/wiki/index.php/Libxc
Available from - https://www.cp2k.org/static/downloads/libint-1.1.4.tar.gz
**FFTW**
The following commands will uncompress and install the LIBINT library required for the UEABS benchmarks:
FFTW3 : http://www.fftw.org or provided as an interface by MKL
tar xzf libint-1.1.4
cd libint-1.1.4
./configure CC=cc CXX=CC --prefix=install_path : must not be this directory
make
make install
**LAPACK & BLAS**
Note: The environment variables ``CC`` and ``CXX`` are optional and can be used to specify the C and C++ compilers to use for the build (the example above is configured to use the compiler wrappers ``cc`` and ``CC`` used on Cray systems).
Can be provided from:
### Install optional libraries
netlib : http://netlib.org/lapack & http://netlib.org/blas
MKL : part of the Intel MKL installation
LibSci : installed on Cray platforms
ATLAS : http://math-atlas.sf.net
OpenBLAS : http://www.openblas.net
clBLAS : http://gpuopen.com/compute-product/clblas/
**SCALAPACK and BLACS**
Can be provided from:
FFTW3 : http://www.fftw.org or provided as an interface by MKL
Libxc : http://www.tddft.org/programs/octopus/wiki/index.php/Libxc
ELPA : https://www.cp2k.org/static/downloads/elpa-2016.05.003.tar.gz
libgrid : within CP2K distribution - cp2k/tools/autotune_grid
libxsmm : https://www.cp2k.org/static/downloads/libxsmm-1.4.4.tar.gz
netlib : http://netlib.org/scalapack/
MKL : part of the Intel MKL installation
LibSci : installed on Cray platforms
### Compile CP2K
**Optional libraries**
ELPA : https://elpa.mpcdf.mpg.de/elpa-tar-archive
libgrid : within CP2K distribution - cp2k/tools/autotune_grid
libxsmm : https://github.com/hfp/libxsmm
### Create the arch file
Before compiling the choice of compilers, the library locations and compilation and linker flags need to be specified. This is done in an arch (architecture) file. Example arch files for a number of common architecture examples can be found inside the ``cp2k/arch`` directory. The names of these files match the pattern architecture.version (e.g., Linux-x86-64-gfortran.sopt). The case "version=psmp" corresponds to the hybrid MPI + OpenMP version that you should build to run the UEABS benchmarks. Machine specific examples can be found in the relevent subdirectory.
......@@ -90,36 +131,49 @@ CP2K is primarily a Fortran code, so only the Fortran compiler needs to be MPI-e
**Specification of the ``DFLAGS`` variable, which should include:**
-D__parallel : to build parallel CP2K executable)
-D__SCALAPACK : to link SCALAPACK
-D__LIBINT : to link to LIBINT
-D__MKL : if relying on MKL for ScaLAPACK and/or an FFTW interface
-D__HAS_NO_SHARED_GLIBC : for convenience on HPC systems, see INSTALL.md file
Additional DFLAGS which are needed to link to performance libraries, such as -D__FFTW3 to link to FFTW3, are listed in the INSTALL file.
-D__parallel \
-D__SCALAPACK \
-D__LIBINT \
-D__FFTW3 \
-D__LIBXC \
# Optional DFLAGS for linking performance libraries:
-D__LIBXSMM \
-D__ELPA=201911 \
-D__HAS_LIBGRID \
-D__SIRIUS \
-D__MKL : if relying on MKL for ScaLAPACK and/or an FFTW interface
**Specification of compiler flags ``FCFLAGS`` (for gfortran):**
FCFLAGS = $(DFLAGS) -ffree-form -fopenmp : Required
FCFLAGS = $(DFLAGS) -ffree-form -fopenmp -O3 -ffast-math -funroll-loops : Recommended
If you want to link any libraries containing header files you should pass the path to the directory containing these to FCFLAGS in the format -I/path_to_include_dir.
If you want to link any libraries containing header files you should pass the
path to the directory containing these to FCFLAGS in the format
``-I/path_to_include_dir``
-I$(path_to_libint)/include
**Specification of libraries to link to:**
-L{path_to_libint}/lib -lderiv -lint : Required for LIBINT
LIBS = -L$(path_to_libint)/lib -lint2 : Required for LIBINT
-L$(path_to_libxc)/lib -lxc90 -lxc03 -lxc : Required for LIBXC
-lfftw3 -lfftw3_threads -lz -ldl -lstdc++
If you use MKL to provide ScaLAPACK and/or an FFTW interface the LIBS variable should be used to pass the relevant flags provided by the MKL Link Line Advisor (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor), which you should use carefully in order to generate the right options for your system.
#### Building the executable
### Build the executable
To build the hybrid MPI+OpenMP executable ``cp2k.psmp`` using *your_arch_file.psmp* run make in the ``cp2k/makefiles`` directory for v4-6 (or in the top-level cp2k directory for v7+).
To build the hybrid MPI+OpenMP executable ``cp2k.psmp`` using ``your_arch_file.psmp`` run make in the cp2k directory.
make -j N ARCH=your_arch_file VERSION=psmp : on N threads
make ARCH=your_arch_file VERSION=psmp : serially
make -j N ARCH=your_arch_file VERSION=psmp : on N threads
make ARCH=your_arch_file VERSION=psmp : serially
The executable ``cp2k.psmp`` will then be located in:
......@@ -129,28 +183,28 @@ The executable ``cp2k.psmp`` will then be located in:
### Compiling CP2K for CUDA enabled GPUs
Arch files for compiling CP2K for CUDA enabled GPUs can be found here:
The arch files for compiling CP2K for CUDA enabled GPUs can be found under the relavent machine example.
In general the main steps are:
The additional steps for CUDA compilation are:
1. Load the cuda module.
2. Ensure that CUDA_PATH variable is set.
3. Add the following to the arch file:
3. Add CUDA specific options to the arch file.
**Addtional required compiler and linker commands**
**Addtional required compiler and linker commands:**
NVCC = nvcc
**Additional ``DFLAGS``**
**Additional ``DFLAGS``:**
-D__ACC -D__DBCSR_ACC -D__PW_CUDA
**Set ``NVFLAGS``**
**Set ``NVFLAGS``:**
NVFLAGS = $(DFLAGS) -O3 -arch sm_60
**Additional required libraries**
**Additional required libraries to link to:**
-lcudart -lcublas -lcufft -lrt
......@@ -173,12 +227,30 @@ Where:
You can try any combination of tasks per node and OpenMP threads per task to investigate absolute performance and scaling on the machine of interest. For tier-1 systems the best performance is usually obtained with pure MPI, while for tier-0 systems the best performance is typically obtained using 1 MPI task per node with the number of threads being equal to the number of cores per node.
**UEABS benchmarks:**
### UEABS benchmarks
**A) H2O-512**
* Ab initio molecular dynamics simulation of 512 water molecules (10 MD steps)
* Uses the Born-Oppenheimer approach via Quickstep DFT
**B) LiH-HFX**
* DFT energy calculation for a 216 atom LiH crystal
* Requires generation of initial wave function .wfn prior to run
* Run ``input_bulk_B88_3.inp`` to generate the wavefunction and then rename the resulting wfn file - ```cp LiH_bulk_3-RESTART.wfn B88.wfn```
**C) H2O-DFT-LS**
* Single energy calculation of 2048 water molecules
* Uses linear scaling DFT
Test Case | System | Number of Atoms | Run type | Description | Location |
----------|------------|-----------------|---------------|------------------------------------------------------|---------------------------------|
a | H2O-512 | 1236 | MD | Uses the Born-Oppenheimer approach via Quickstep DFT | ``/tests/QS/benchmark/`` |
b | LiHFX | 216 | Single-energy | Must create wavefuntion first - see benchmark README | ``/tests/QS/benchmark_HFX/LiH`` |
b | LiH-HFX | 216 | Single-energy | GAPW with hybrid Hartree-Fock exchange | ``/tests/QS/benchmark_HFX/LiH`` |
c | H2O-DFT-LS | 6144 | Single-energy | Uses linear scaling DFT | ``/tests/QS/benchmark_DM_LS`` |
More information in the form of a README and an example job script is included in each benchmark tar file.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment