README.md 11.5 KB
Newer Older
Holly Judge's avatar
Holly Judge committed
1
# CP2K
2
3


Holly Judge's avatar
Holly Judge committed
4
## Summary Version
5

Holly Judge's avatar
Holly Judge committed
6
1.0
7

Holly Judge's avatar
Holly Judge committed
8
## Purpose of Benchmark
9

Holly Judge's avatar
Holly Judge committed
10
CP2K is a freely available quantum chemistry and solid-state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. 
11

Holly Judge's avatar
Holly Judge committed
12
## Characteristics of Benchmark
13

Holly Judge's avatar
Holly Judge committed
14
CP2K can be used to perform DFT calculations using the QuickStep algorithm. This applies mixed  Gaussian and plane waves approaches (such as GPW and GAPW). Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.
15

Holly Judge's avatar
Holly Judge committed
16
CP2K is written in Fortran 2008 and can be run in parallel using a combination of multi-threading, MPI, and CUDA. All of CP2K is MPI parallelised, with some additional loops also being OpenMP parallelised. It is therefore most important to take advantage of MPI parallelisation, however running one MPI rank per CPU core often leads to memory shortage. At this point OpenMP threads can be used to utilise all CPU cores without suffering an overly large memory footprint. The optimal ratio between MPI ranks and OpenMP threads depends on the type of simulation and the system in question. CP2K supports CUDA, allowing it to offload some linear algebra operations including sparse matrix multiplications to the GPU through its DBCSR acceleration layer. FFTs can optionally also be offloaded to the GPU. Benefits of GPU offloading may yield improved performance depending on the type of simulation and the system in question.
17

Holly Judge's avatar
Holly Judge committed
18
## Mechanics of Building Benchmark
19

Holly Judge's avatar
Holly Judge committed
20
GNU make and Python 2.x are required for the build process, as are a Fortran 2003 compiler and matching C compiler, e.g. gcc/gfortran (gcc >=4.6 works, later version is recommended).
21

Holly Judge's avatar
Holly Judge committed
22
CP2K can benefit from a number of external libraries for improved performance. It is advised to use vendor-optimized versions of these libraries. If these are not available on your machine, there exist freely available implementations of these libraries including but not limited to those listed below.
23
24


Holly Judge's avatar
Holly Judge committed
25
<!--- (CP2K is built using a Fortran 2003 compiler and matching C compiler such as gfortran/gcc (version 4.6 and above) and compiled with GNU make. CP2K makes use of a variety of different libraries. Some are essential for running in parallel and others may be used to increase the performance. The steps to build CP2K are as follows:) -->
26

Holly Judge's avatar
Holly Judge committed
27
### Download the source code
28

Holly Judge's avatar
Holly Judge committed
29
Download a CP2K release from https://sourceforge.net/projects/cp2k/files/ or follow instructions at https://www.cp2k.org/download to check out the relevant branch of the CP2K GitHub repository.
30

Holly Judge's avatar
Holly Judge committed
31
32
### Install or locate required libraries 
**LAPACK & BLAS**
33

Holly Judge's avatar
Holly Judge committed
34
    Can be provided from:
35

Holly Judge's avatar
Holly Judge committed
36
37
38
39
40
41
    netlib	                                                                	: http://netlib.org/lapack & http://netlib.org/blas
    MKL	                                                                    	: part of the Intel MKL installation
    LibSci                                                              		: installed on Cray platforms
    ATLAS	                                                                	: http://math-atlas.sf.net
    OpenBLAS                                                                	: http://www.openblas.net
    clBLAS	                                                                        : http://gpuopen.com/compute-product/clblas/
42

Holly Judge's avatar
Holly Judge committed
43
**SCALAPACK and BLACS**
44

Holly Judge's avatar
Holly Judge committed
45
    Can be provided from:
46

Holly Judge's avatar
Holly Judge committed
47
48
49
    netlib	                                                                	: http://netlib.org/scalapack/
    MKL	                                                                    	: part of the Intel MKL installation
    LibSci	                                                                   	: installed on Cray platforms
50

Holly Judge's avatar
Holly Judge committed
51
**LIBINT**
52

Holly Judge's avatar
Holly Judge committed
53
Available from - https://www.cp2k.org/static/downloads/libint-1.1.4.tar.gz
54

Holly Judge's avatar
Holly Judge committed
55
The following commands will uncompress and install the LIBINT library required for the UEABS benchmarks:
56

Holly Judge's avatar
Holly Judge committed
57
58
59
60
61
    tar xzf libint-1.1.4
    cd libint-1.1.4
    ./configure CC=cc CXX=CC --prefix=install_path 	                        	 : must not be this directory
    make
    make install
62

Holly Judge's avatar
Holly Judge committed
63
Note: The environment variables ``CC`` and ``CXX`` are optional and can be used to specify the C and C++ compilers to use for the build (the example above is configured to use the compiler wrappers ``cc`` and ``CC`` used on Cray systems).
64

Holly Judge's avatar
Holly Judge committed
65
66
### Install optional libraries 
 
67

Holly Judge's avatar
Holly Judge committed
68
69
70
71
72
    FFTW3	                                                                	: http://www.fftw.org or provided as an interface by MKL
    Libxc	                                                                	: http://www.tddft.org/programs/octopus/wiki/index.php/Libxc
    ELPA	                                                                   	: https://www.cp2k.org/static/downloads/elpa-2016.05.003.tar.gz
    libgrid	                                                                	: within CP2K distribution - cp2k/tools/autotune_grid
    libxsmm	                                                                        : https://www.cp2k.org/static/downloads/libxsmm-1.4.4.tar.gz
73

Holly Judge's avatar
Holly Judge committed
74
### Compile CP2K
75

Holly Judge's avatar
Holly Judge committed
76
Before compiling the choice of compilers, the library locations and compilation and linker flags need to be specified. This is done in an arch (architecture) file. Example arch files for a number of common architecture examples can be found inside the ``cp2k/arch`` directory. The names of these files match the pattern architecture.version (e.g., Linux-x86-64-gfortran.sopt). The case "version=psmp" corresponds to the hybrid MPI + OpenMP version that you should build to run the UEABS benchmarks. Machine specific examples can be found in the relevent subdirectory.
77

Holly Judge's avatar
Holly Judge committed
78
In most cases you need to create a custom arch file, either from scratch or by modifying an existing one that roughly fits the cpu type, compiler, and installation paths of libraries on your system. You can also consult https://dashboard.cp2k.org, which provides sample arch files as part of the testing reports for some platforms (click on the status field for a platform, and search for 'ARCH-file' in the resulting output).
79

Holly Judge's avatar
Holly Judge committed
80
As a guide for GNU compilers the following should be included in the ``arch`` file:
81

Holly Judge's avatar
Holly Judge committed
82
**Specification of which compiler and linker commands to use:**
83

Holly Judge's avatar
Holly Judge committed
84
85
86
87
    CC = gcc
    FC = mpif90
    LD = mpif90
    AR = ar -r
88

Holly Judge's avatar
Holly Judge committed
89
CP2K is primarily a Fortran code, so only the Fortran compiler needs to be MPI-enabled.
90

Holly Judge's avatar
Holly Judge committed
91
**Specification of the ``DFLAGS`` variable, which should include:**
92

Holly Judge's avatar
Holly Judge committed
93
94
95
96
97
	-D__parallel 	                                                        	: to build parallel CP2K executable)
	-D__SCALAPACK 	                                                                : to link SCALAPACK
	-D__LIBINT 		                                                       	: to link to LIBINT
	-D__MKL 		                                                       	: if relying on MKL for ScaLAPACK and/or an FFTW interface
	-D__HAS_NO_SHARED_GLIBC                                                         : for convenience on HPC systems, see INSTALL.md file
98

Holly Judge's avatar
Holly Judge committed
99
Additional DFLAGS which are needed to link to performance libraries, such as -D__FFTW3 to link to FFTW3, are listed in the INSTALL file. 
100

Holly Judge's avatar
Holly Judge committed
101
**Specification of compiler flags ``FCFLAGS`` (for gfortran):**
102

Holly Judge's avatar
Holly Judge committed
103
104
105
106
107
108
109
110
111
112
113
    FCFLAGS = $(DFLAGS) -ffree-form -fopenmp                                        : Required
    FCFLAGS = $(DFLAGS) -ffree-form -fopenmp -O3 -ffast-math -funroll-loops         : Recommended

If you want to link any libraries containing header files you should pass the path to the directory containing these to FCFLAGS in the format -I/path_to_include_dir.


**Specification of libraries to link to:**

    -L{path_to_libint}/lib -lderiv -lint                                             : Required for LIBINT

If you use MKL to provide ScaLAPACK and/or an FFTW interface the LIBS variable should be used to pass the relevant flags provided by the MKL Link Line Advisor (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor), which you should use carefully in order to generate the right options for your system.
114

115
116


Holly Judge's avatar
Holly Judge committed
117
#### Building the executable
118

Holly Judge's avatar
Holly Judge committed
119
120
121
122
To build the hybrid MPI+OpenMP executable ``cp2k.psmp`` using *your_arch_file.psmp* run make in the ``cp2k/makefiles`` directory for v4-6 (or in the top-level cp2k directory for v7+).
 
    make -j N ARCH=your_arch_file VERSION=psmp	                               	: on N threads
    make ARCH=your_arch_file VERSION=psmp	                                    	: serially
123

Holly Judge's avatar
Holly Judge committed
124
The executable ``cp2k.psmp`` will then be located in:
125

Holly Judge's avatar
Holly Judge committed
126
    cp2k/exe/your_arch_file
127

128
129


Holly Judge's avatar
Holly Judge committed
130
### Compiling CP2K for CUDA enabled GPUs
131

Holly Judge's avatar
Holly Judge committed
132
Arch files for compiling CP2K for CUDA enabled GPUs can be found here:
133

Holly Judge's avatar
Holly Judge committed
134
In general the main steps are:
135

Holly Judge's avatar
Holly Judge committed
136
137
138
1. Load the cuda module.
2. Ensure that CUDA_PATH variable is set.
3. Add the following to the arch file:
139
140


Holly Judge's avatar
Holly Judge committed
141
**Addtional required compiler and linker commands**
142

Holly Judge's avatar
Holly Judge committed
143
    NVCC = nvcc
144

Holly Judge's avatar
Holly Judge committed
145
**Additional ``DFLAGS``**
146

Holly Judge's avatar
Holly Judge committed
147
148
149
    -D__ACC -D__DBCSR_ACC -D__PW_CUDA
 
**Set ``NVFLAGS``**
150

Holly Judge's avatar
Holly Judge committed
151
152
153
154
155
    NVFLAGS = $(DFLAGS) -O3 -arch sm_60
 
**Additional required libraries**
 
    -lcudart -lcublas -lcufft -lrt 
156
157
158



Holly Judge's avatar
Holly Judge committed
159
160
161
162
163
164
165
166
167
168
169
170
171
172
## Mechanics of Running Benchmark

The general way to run the benchmarks with the hybrid parallel executable is:

    export OMP_NUM_THREADS=X   
    parallel_launcher launcher_options path_to_/cp2k.psmp -i inputfile.inp -o logfile  

Where:

* The environment variable for the number of threads must be set before calling the executable.
* The parallel_launcher is mpirun, mpiexec, or some variant such as aprun on Cray systems or srun when using Slurm.
* launcher_options specifies parallel placement in terms of total numbers of nodes, MPI ranks/tasks, tasks per node, and OpenMP threads per task (which should be equal to the value given to OMP_NUM_THREADS). This is not necessary if parallel runtime options are picked up by the launcher from the job environment.
* You can try any combination of tasks per node and OpenMP threads per task to investigate absolute performance and scaling on the machine of interest.
* The inputfile usually has the extension .inp, and may specify within it further requried files (such as basis sets, potentials, etc.)
173

Holly Judge's avatar
Holly Judge committed
174
You can try any combination of tasks per node and OpenMP threads per task to investigate absolute performance and scaling on the machine of interest. For tier-1 systems the best performance is usually obtained with pure MPI, while for tier-0 systems the best performance is typically obtained using 1 MPI task per node with the number of threads being equal to the number of cores per node.
175

Holly Judge's avatar
Holly Judge committed
176
**UEABS benchmarks:**
177

Holly Judge's avatar
Holly Judge committed
178
179
180
181
182
183
184
Test Case | System     | Number of Atoms | Run type      | Description                                          | Location                        |
----------|------------|-----------------|---------------|------------------------------------------------------|---------------------------------|
a         | H2O-512    |   1236          | MD            | Uses the Born-Oppenheimer approach via Quickstep DFT | ``/tests/QS/benchmark/``        |
b         | LiHFX      |      216        | Single-energy | Must create wavefuntion first - see benchmark README | ``/tests/QS/benchmark_HFX/LiH`` |
c         | H2O-DFT-LS | 6144            | Single-energy | Uses linear scaling DFT                              | ``/tests/QS/benchmark_DM_LS``    |
 
More information in the form of a README and an example job script is included in each benchmark tar file.
185

Holly Judge's avatar
Holly Judge committed
186
## Verification of Results
187

Holly Judge's avatar
Holly Judge committed
188
The run walltime is reported near the end of logfile:
189

Holly Judge's avatar
Holly Judge committed
190
    grep "CP2K    " logfile | awk -F ' ' '{print $7}'
191