README.md 12.1 KB
Newer Older
Holly Judge's avatar
Holly Judge committed
1
# CP2K
2
3


Holly Judge's avatar
Holly Judge committed
4
## Summary Version
5

Holly Judge's avatar
Holly Judge committed
6
1.0
7

Holly Judge's avatar
Holly Judge committed
8
## Purpose of Benchmark
9

Holly Judge's avatar
Holly Judge committed
10
11
12
CP2K is a freely available quantum chemistry and solid-state physics software
package that can perform atomistic simulations of solid state, liquid,
molecular, periodic, material, crystal, and biological systems. 
13

Holly Judge's avatar
Holly Judge committed
14
## Characteristics of Benchmark
15

Holly Judge's avatar
Holly Judge committed
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
CP2K can be used to perform DFT calculations using the QuickStep algorithm. This
applies mixed  Gaussian and plane waves approaches (such as GPW and GAPW).
Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods
(AM1, PM3, PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …).
CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, 
Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy
minimisation, and transition state optimisation using NEB or dimer method.

CP2K is written in Fortran 2008 and can be run in parallel using a combination
of multi-threading, MPI, and CUDA. All of CP2K is MPI parallelised, with some 
additional loops also being OpenMP parallelised. It is therefore most important 
to take advantage of MPI parallelisation, however running one MPI rank per CPU
core often leads to memory shortage. At this point OpenMP threads can be used 
to utilise all CPU cores without suffering an overly large memory footprint. 
The optimal ratio between MPI ranks and OpenMP threads depends on the type of
simulation and the system in question. CP2K supports CUDA, allowing it to
offload some linear algebra operations including sparse matrix multiplications 
to the GPU through its DBCSR acceleration layer. FFTs can optionally also be 
offloaded to the GPU. Benefits of GPU offloading may yield improved performance
depending on the type of simulation and the system in question.
36

Holly Judge's avatar
Holly Judge committed
37
## Mechanics of Building Benchmark
38

Holly Judge's avatar
Holly Judge committed
39
40
41
42
43
44
45
46
GNU make and Python 2.x are required for the build process, as are a Fortran 
2003 compiler and matching C compiler, e.g. gcc/gfortran (gcc >=4.6 works, later
version is recommended).

CP2K can benefit from a number of external libraries for improved performance. 
It is advised to use vendor-optimized versions of these libraries. If these are 
not available on your machine, there exist freely available implementations of
these libraries.
47

Holly Judge's avatar
Holly Judge committed
48
Overview of build process:
49

Holly Judge's avatar
Holly Judge committed
50
51
52
53
54
55
1. Install Libint.
2. Install Libxc.
3. Install FFTW library (or use MKL's FFTW3 interface).
4. Check if LAPACK, BLAS, SCALAPACK and BLACS are provided and install if not.
5. Install optional libraries - ELPA, libxsmm, libgrid.
6. Build CP2K and link to Libint, Libxc, FFTW, LAPACK, BLAS, SCALAPACK and BLACS, and to relevant CUDA libraries if building for GPU.
56

Holly Judge's avatar
Holly Judge committed
57
<!--- (CP2K is built using a Fortran 2003 compiler and matching C compiler such as gfortran/gcc (version 4.6 and above) and compiled with GNU make. CP2K makes use of a variety of different libraries. Some are essential for running in parallel and others may be used to increase the performance. The steps to build CP2K are as follows:) -->
58

Holly Judge's avatar
Holly Judge committed
59
### Download the source code
60

Holly Judge's avatar
Holly Judge committed
61
62
63
64
     wget https://github.com/cp2k/cp2k/releases/download/v7.1.0/cp2k-7.1.tar.bz2
     bunzip2 cp2k-7.1.tar.bz2
     tar xvf cp2k-7.1.tar
     cd cp2k-7.1
65

Holly Judge's avatar
Holly Judge committed
66
### Install or locate libraries 
67

Holly Judge's avatar
Holly Judge committed
68
**LIBINT**
69

Holly Judge's avatar
Holly Judge committed
70
The following commands will uncompress and install the LIBINT library required for the UEABS benchmarks:
71

Holly Judge's avatar
Holly Judge committed
72
73
74
75
76
77
    wget https://github.com/cp2k/libint-cp2k/releases/download/v2.6.0/libint-v2.6.0-cp2k-lmax-4.tgz
    tar zxvf libint-v2.6.0-cp2k-lmax-4.tgz
    cd libint-v2.6.0-cp2k-lmax-4
    ./configure CC=cc CXX=CC FC=ftn --enable-fortran --prefix=install_path 	               	 : must not be this directory
    make
    make install
78

Holly Judge's avatar
Holly Judge committed
79
Note: The environment variables ``CC`` and ``CXX`` are optional and can be used to specify the C and C++ compilers to use for the build (the example above is configured to use the compiler wrappers ``cc`` and ``CC`` used on Cray systems).
80

Holly Judge's avatar
Holly Judge committed
81
**LIBXC**
82

Holly Judge's avatar
Holly Judge committed
83
    Libxc - v4.0.3 or later                                        	: http://www.tddft.org/programs/octopus/wiki/index.php/Libxc
84

Holly Judge's avatar
Holly Judge committed
85
**FFTW**
86

Holly Judge's avatar
Holly Judge committed
87
    FFTW3	                                                       	: http://www.fftw.org or provided as an interface by MKL
88

Holly Judge's avatar
Holly Judge committed
89
**LAPACK & BLAS**
90

Holly Judge's avatar
Holly Judge committed
91
    Can be provided from:
92

Holly Judge's avatar
Holly Judge committed
93
94
95
96
97
98
99
100
101
102
    netlib	                                                      	: http://netlib.org/lapack & http://netlib.org/blas
    MKL	                                                      	: part of the Intel MKL installation
    LibSci                                                         	: installed on Cray platforms
    ATLAS	                                                      	: http://math-atlas.sf.net
    OpenBLAS                                                    	: http://www.openblas.net
    clBLAS	                                                        : http://gpuopen.com/compute-product/clblas/

**SCALAPACK and BLACS**

    Can be provided from:
103

Holly Judge's avatar
Holly Judge committed
104
105
106
    netlib	                                                       	: http://netlib.org/scalapack/
    MKL	                                                       	: part of the Intel MKL installation
    LibSci	                                                      	: installed on Cray platforms
107

Holly Judge's avatar
Holly Judge committed
108
109
110
111
112
113
114
115

**Optional libraries**

    ELPA	                                                       	: https://elpa.mpcdf.mpg.de/elpa-tar-archive
    libgrid	                                                    	: within CP2K distribution - cp2k/tools/autotune_grid
    libxsmm	                                                        : https://github.com/hfp/libxsmm

### Create the arch file
116

Holly Judge's avatar
Holly Judge committed
117
Before compiling the choice of compilers, the library locations and compilation and linker flags need to be specified. This is done in an arch (architecture) file. Example arch files for a number of common architecture examples can be found inside the ``cp2k/arch`` directory. The names of these files match the pattern architecture.version (e.g., Linux-x86-64-gfortran.sopt). The case "version=psmp" corresponds to the hybrid MPI + OpenMP version that you should build to run the UEABS benchmarks. Machine specific examples can be found in the relevent subdirectory.
118

Holly Judge's avatar
Holly Judge committed
119
In most cases you need to create a custom arch file, either from scratch or by modifying an existing one that roughly fits the cpu type, compiler, and installation paths of libraries on your system. You can also consult https://dashboard.cp2k.org, which provides sample arch files as part of the testing reports for some platforms (click on the status field for a platform, and search for 'ARCH-file' in the resulting output).
120

Holly Judge's avatar
Holly Judge committed
121
As a guide for GNU compilers the following should be included in the ``arch`` file:
122

Holly Judge's avatar
Holly Judge committed
123
**Specification of which compiler and linker commands to use:**
124

Holly Judge's avatar
Holly Judge committed
125
126
127
128
    CC = gcc
    FC = mpif90
    LD = mpif90
    AR = ar -r
129

Holly Judge's avatar
Holly Judge committed
130
CP2K is primarily a Fortran code, so only the Fortran compiler needs to be MPI-enabled.
131

Holly Judge's avatar
Holly Judge committed
132
**Specification of the ``DFLAGS`` variable, which should include:**
133

Holly Judge's avatar
Holly Judge committed
134
135
136
137
138
139
140
141
142
143
144
145
146
147
	-D__parallel \	                                              	
	-D__SCALAPACK \	                                                
	-D__LIBINT \	
	-D__FFTW3 \    
	-D__LIBXC \     
    
    # Optional DFLAGS for linking performance libraries:
    
    -D__LIBXSMM \
    -D__ELPA=201911 \
    -D__HAS_LIBGRID \
    -D__SIRIUS \
    -D__MKL 		                                      	: if relying on MKL for ScaLAPACK and/or an FFTW interface
    
148

Holly Judge's avatar
Holly Judge committed
149
**Specification of compiler flags ``FCFLAGS`` (for gfortran):**
150

Holly Judge's avatar
Holly Judge committed
151
152
153
    FCFLAGS = $(DFLAGS) -ffree-form -fopenmp                                        : Required
    FCFLAGS = $(DFLAGS) -ffree-form -fopenmp -O3 -ffast-math -funroll-loops         : Recommended

Holly Judge's avatar
Holly Judge committed
154
155
156
157
158
If you want to link any libraries containing header files you should pass the
path to the directory containing these to FCFLAGS in the format 
``-I/path_to_include_dir``

    -I$(path_to_libint)/include 
Holly Judge's avatar
Holly Judge committed
159
160
161
162


**Specification of libraries to link to:**

Holly Judge's avatar
Holly Judge committed
163
164
165
    LIBS = -L$(path_to_libint)/lib -lint2                           : Required for LIBINT
           -L$(path_to_libxc)/lib -lxc90 -lxc03 -lxc                : Required for LIBXC
           -lfftw3 -lfftw3_threads -lz -ldl -lstdc++
Holly Judge's avatar
Holly Judge committed
166
167

If you use MKL to provide ScaLAPACK and/or an FFTW interface the LIBS variable should be used to pass the relevant flags provided by the MKL Link Line Advisor (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor), which you should use carefully in order to generate the right options for your system.
168

169
170


Holly Judge's avatar
Holly Judge committed
171
### Build the executable
172

Holly Judge's avatar
Holly Judge committed
173
To build the hybrid MPI+OpenMP executable ``cp2k.psmp`` using ``your_arch_file.psmp`` run make in the cp2k directory.
Holly Judge's avatar
Holly Judge committed
174
 
Holly Judge's avatar
Holly Judge committed
175
176
    make -j N ARCH=your_arch_file VERSION=psmp	            	: on N threads
    make ARCH=your_arch_file VERSION=psmp	                     	: serially
177

Holly Judge's avatar
Holly Judge committed
178
The executable ``cp2k.psmp`` will then be located in:
179

Holly Judge's avatar
Holly Judge committed
180
    cp2k/exe/your_arch_file
181

182
183


Holly Judge's avatar
Holly Judge committed
184
### Compiling CP2K for CUDA enabled GPUs
185

Holly Judge's avatar
Holly Judge committed
186
The arch files for compiling CP2K for CUDA enabled GPUs can be found under the relavent machine example.
187

Holly Judge's avatar
Holly Judge committed
188
The additional steps for CUDA compilation are:
189

Holly Judge's avatar
Holly Judge committed
190
191
1. Load the cuda module.
2. Ensure that CUDA_PATH variable is set.
Holly Judge's avatar
Holly Judge committed
192
3. Add CUDA specific options to the arch file.
193
194


Holly Judge's avatar
Holly Judge committed
195
**Addtional required compiler and linker commands:**
196

Holly Judge's avatar
Holly Judge committed
197
    NVCC = nvcc
198

Holly Judge's avatar
Holly Judge committed
199
**Additional ``DFLAGS``:**
200

Holly Judge's avatar
Holly Judge committed
201
202
    -D__ACC -D__DBCSR_ACC -D__PW_CUDA
 
Holly Judge's avatar
Holly Judge committed
203
**Set ``NVFLAGS``:**
204

Holly Judge's avatar
Holly Judge committed
205
206
    NVFLAGS = $(DFLAGS) -O3 -arch sm_60
 
Holly Judge's avatar
Holly Judge committed
207
**Additional required libraries to link to:**
Holly Judge's avatar
Holly Judge committed
208
209
 
    -lcudart -lcublas -lcufft -lrt 
210
211
212



Holly Judge's avatar
Holly Judge committed
213
214
215
216
217
218
219
220
221
222
223
224
225
226
## Mechanics of Running Benchmark

The general way to run the benchmarks with the hybrid parallel executable is:

    export OMP_NUM_THREADS=X   
    parallel_launcher launcher_options path_to_/cp2k.psmp -i inputfile.inp -o logfile  

Where:

* The environment variable for the number of threads must be set before calling the executable.
* The parallel_launcher is mpirun, mpiexec, or some variant such as aprun on Cray systems or srun when using Slurm.
* launcher_options specifies parallel placement in terms of total numbers of nodes, MPI ranks/tasks, tasks per node, and OpenMP threads per task (which should be equal to the value given to OMP_NUM_THREADS). This is not necessary if parallel runtime options are picked up by the launcher from the job environment.
* You can try any combination of tasks per node and OpenMP threads per task to investigate absolute performance and scaling on the machine of interest.
* The inputfile usually has the extension .inp, and may specify within it further requried files (such as basis sets, potentials, etc.)
227

Holly Judge's avatar
Holly Judge committed
228
You can try any combination of tasks per node and OpenMP threads per task to investigate absolute performance and scaling on the machine of interest. For tier-1 systems the best performance is usually obtained with pure MPI, while for tier-0 systems the best performance is typically obtained using 1 MPI task per node with the number of threads being equal to the number of cores per node.
229

Holly Judge's avatar
Holly Judge committed
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
### UEABS benchmarks

**A) H2O-512**

* Ab initio molecular dynamics simulation of 512 water molecules (10 MD steps)
* Uses the Born-Oppenheimer approach via Quickstep DFT

**B) LiH-HFX**

* DFT energy calculation for a 216 atom LiH crystal
* Requires generation of initial wave function .wfn prior to run
* Run ``input_bulk_B88_3.inp`` to generate the wavefunction and then rename the resulting wfn file - ```cp LiH_bulk_3-RESTART.wfn B88.wfn```


**C) H2O-DFT-LS**

* Single energy calculation of 2048 water molecules
* Uses linear scaling DFT

249

Holly Judge's avatar
Holly Judge committed
250
251
252
Test Case | System     | Number of Atoms | Run type      | Description                                          | Location                        |
----------|------------|-----------------|---------------|------------------------------------------------------|---------------------------------|
a         | H2O-512    |   1236          | MD            | Uses the Born-Oppenheimer approach via Quickstep DFT | ``/tests/QS/benchmark/``        |
Holly Judge's avatar
Holly Judge committed
253
b         | LiH-HFX    |      216        | Single-energy | GAPW with hybrid Hartree-Fock exchange               | ``/tests/QS/benchmark_HFX/LiH`` |
Holly Judge's avatar
Holly Judge committed
254
255
256
c         | H2O-DFT-LS | 6144            | Single-energy | Uses linear scaling DFT                              | ``/tests/QS/benchmark_DM_LS``    |
 
More information in the form of a README and an example job script is included in each benchmark tar file.
257

Holly Judge's avatar
Holly Judge committed
258
## Verification of Results
259

Holly Judge's avatar
Holly Judge committed
260
The run walltime is reported near the end of logfile:
261

Holly Judge's avatar
Holly Judge committed
262
    grep "CP2K    " logfile | awk -F ' ' '{print $7}'
263