Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# NAMD Build and Run instructions using CUDA, KNC offloading and KNL.
## CUDA Build instructions
In order to run benchmarks, the memopt build with SMP support is mandatory.
NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O.
In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file.
Since NAMD version 2.11, build scripts deny to compile with MPI for accelerators. Instead, the verbs interface is suggested.
You could overwrite this and use MPI instead the suggested verbs by commenting out the following lines in config script
```
if ( $charm_arch_mpi || ! $charm_arch_smp ) then
echo ''
echo "ERROR: $ERRTYPE builds require non-MPI SMP or multicore Charm++ arch for reasonable performance."
echo ''
echo "Consider ibverbs-smp or verbs-smp (InfiniBand), gni-smp (Cray), or multicore (single node)."
echo ''
exit 1
endif
```
You need a NAMD 2.11 version or newer.
* Uncompress/tar the source.
* cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source )
* untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm.
* cd to charm-VERSION directory
### configure and compile charm :
This step is system dependent. Some examples are :
Linux with Intel compilers :
```
./build charm++ verbs-linux-x86_64 smp icc --with-production -O -DCMK_OPTIMIZE
```
Linux with GNU compilers :
```
./build charm++ verbs-linux-x86_64 smp gcc --with-production -O -DCMK_OPTIMIZE
```
Help:
```
./build --help to see all available options.
```
For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) .
The syntax is :
```
./build charm++ ARCHITECTURE smp (compilers, optional) --with-production -O -DCMK_OPTIMIZE
```
You can find a list of supported architectures/compilers in `charm-VERSION/src/arch`
The smp option is mandatory to build the Hybrid version of namd.
This builds charm++.
`cd ..`
### Configure NAMD.
This step is system dependent. Some examples are :
Linux x86_64/AVX with Intel Compilers :
```
./config Linux-x86_64-icc --charm-base ./charm-6.7.0 --charm-arch verbs-linux-x86_64-smp-icc --with-fftw3 \
--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --with-cuda \
--cuda-prefix PATH_TO_CUDA_INSTALLATION_ROOT \
--charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec "
```
Help :
```
./config --help to see all available options.
```
See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) for special notes on various systems.
What is absolutely necessary is the option : `--with-memopt, --with-cuda` and an SMP enabled charm++ build.
It is suggested to disable tcl support as it is indicated by the `--without-tcl` flag, since tcl is not necessary
to run the benchmarks.
You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables.
If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/) .
You may adjust the compilers and compiler flags as in the Linux x86_64/AVX example.
A typical use of compilers/flags adjustement is for example to add `-xAVX` and keep all the other compiler flags of the architecture the same.
Take care or even just avoid using the `--cxx` option for NAMD config with no reason, as this will override the compilation flags from the arch files in some cases.
When config ends prompts to change to a directory and run make.
### cd to the reported directory and run make
If everything is ok you'll find the executable with name namd2
and the paraller wrapper called charmrun in this directory.
## KNC/offloading Build instructions
The build instruction for building namd binaries for offloading on KNC are similar to those of GPU with some modifications
The namd configure stage contains `--with-mic` instead of `--with-cuda`. The rest of options is the same
For example :
Linux x86_64/AVX with Intel Compilers :
```
./config Linux-x86_64-icc --charm-base ./charm-6.7.0 --charm-arch verbs-linux-x86_64-smp-icc --with-fftw3 \
--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt \
--with-mic \
--charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec "
```
What is absolutely necessary is the option : `--with-memopt`, `--with-mic` and an SMP enabled charm++ build.
## KNL Build instructions
For KNL follow the Build and Run Instructions in UEABS suite replacing the compiler flag `-xAVX` to `-xMIC-AVX512`
## Run Instructions
After build of NAMD you have the NAMD executable called namd2 and the parallel wrapper called charmrun.
The best performance and scaling of namd is usually achieved using hybrid MPI/SMP version.
On a system with nodes of NC cores per node use 1 MPI task per node and NC threads per task,
for example on a 20 cores/node system use 1 MPI process,
set `OMP_NUM_THREADS` or any batch system related variable to 20.
Set a variable, for example PPN to NC-1,
for example to 19 for a 20 cores/node system.
Since charmrun is used as parallel wrapper, one needs to specify the total number of tasks, threads per task and a hostfile in charmrun command line.
You can also try other combinations of `TASKSPERNODE/THREADSPERTASK` to check.
In order to use Accelerators, you need to specify the Accelerator devices in command line.
Typically one gets this information from the batch system.
For example, in case of SLURM workload manager, this variable is `$SLURM_JOB_GPUS` for GPUS
or `$OFFLOAD_DEVICES` for KNC.
Typical values of these variables are : 0 or 1 in the case you request 1 accelerator per node,
or 0,1 in the case you request 2 accelerators per node etc.
The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems.
The general way to run the accelerated NAMD is :
```
charmrun PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices AcceleratorsIDS configfile > logfile
```
In the case of SLURM workload manager :
```
PPN=`expr $SLURM_CPUS_PER_TASK - 1`
P=`expr $SLURM_NNODES \* $PPN `
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do echo "host $n ++cpus $SLURM_NTASKS_PER_NODE" >> hostfile; done;
```
for GPUs
```
charmrun PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $SLURM_JOB_GPUS stmv.8M.memopt.namd > logfile
```
for KNC
```
charmrun PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $OFFLOAD_DEVICES configfile > logfile
```
The run walltime is reported at the end of logfile : `grep WallClock: logfile | awk -F ' ' '{print $2}'`