NAMD_README.md 13.7 KB
Newer Older
Dimitris Dellis's avatar
Dimitris Dellis committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
## Overview

## [Download](#download)

The official site to download namd is : [http://www.ks.uiuc.edu/Research/namd/](http://www.ks.uiuc.edu/Research/namd/)

You need to register for free here to get a namd copy from here : [http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD](http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD)

In order to get a specific CVS snapshot, you need first to ask for,

 - username/password :  [http://www.ks.uiuc.edu/Research/namd/cvsrequest.html](http://www.ks.uiuc.edu/Research/namd/cvsrequest.html)
 - When your cvs access application is approved, you can use your username/password to download a specific cvs snapshot : 
 `cvs  -d :pserver:username@cvs.ks.uiuc.edu:/namd/cvsroot co -D "2013-02-06 23:59:00 GMT" namd2`
    
    In this case, the `charm++` is not included. 
    You have to download separately and put it in the namd2 source tree : [http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz](http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz)


## [Build and Run](#build_run)

### [Build](#build)

Build instructions for namd. In order to run benchmarks the memopt build with SMP support is mandatory.
 
NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O. 
In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file. 

In order to build this version, your MPI need to have level of thread support: `MPI_THREAD_FUNNELED`
You need NAMD 2.11 version. Latest version is suggested (currently 2.13)

1. Uncompress/tar the source. 
2. cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source )
3. untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm.
4. cd to charm-VERSION directory
5. configure and compile charm :

    This step is system dependent. Some examples are :
    
    -  CRAY XE6 : `./build charm++ mpi-crayxe smp --with-production -O -DCMK_OPTIMIZE`
    - CURIE : `./build charm++ mpi-linux-x86_64 smp mpicxx ifort --with-production -O -DCMK_OPTIMIZE`
    - JUQUEEN : `./build charm++ mpi-bluegeneq smp xlc --with-production -O -DCMK_OPTIMIZE`
    - Help : `./build --help` to see all available options.
    For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html).
    
    The syntax is : `./build charm++ ARCHITECTURE  smp (compilers, optional)   --with-production -O -DCMK_OPTIMIZE`
    You can find a list of supported architectures/compilers in `charm-VERSION/src/arch`
    The smp option is mandatory to build the Hybrid version of namd.
    This builds `charm++`.

6. cd ..
7. Configure NAMD.
    
    This step is system dependent. Some examples are :
    
    - CRAY-XE6: `./config CRAY-XT-g++ --charm-base ./charm-6.7.0 --charm-arch mpi-crayxe-smp --with-fftw3 --fftw-prefix $CRAY_FFTW_DIR --without-tcl --with-memopt --charm-opts -verbose`
    
    - CURIE: `./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "`
    - Juqueen: `./config BlueGeneQ-MPI-xlC --charm-base ./charm-6.7.0 --charm-arch mpi-bluegeneq-smp-xlc --with-fftw3 --with-fftw-prefix PATH_TO_FFTW3_INSTALLATION  --without-tcl --charm-opts -verbose --with-memopt`
    - Help: `./config --help` to see all available options.
    See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) for special notes on various systems.
    
    What is absolutely necessary is the option : `--with-memopt` and an SMP enabled charm++ build.
    It is suggested to disable tcl support as it is indicated by the `--without-tcl` flags, since tcl is not necessary to run the benchmarks.
    
    You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables - like in CRAY-XE6 example above. 
    
    If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/).
    
    You may adjust the compilers and compiler flags as the CURIE example. 
    
    A typical use of compilers/flags adjustement is for example to add `-xAVX` in the CURIE case and keep all the other compiler flags of the architecture the same.
    Take care or even just avoid using the `--cxx` option for NAMD config with no reason, as this will override the compilation flags from the arch file.
    When config ends prompts to change to a directory and run make. 

8. cd to the reported directory and run `make` 
    If everything is ok you'll find the executable with name `namd2` in this directory.


### [Run](#run)

Run instructions for NAMD. ntell@grnet.gr

After build of NAMD you have an executable called `namd2`.

The best performance and scaling of namd is achieved using hybrid MPI/MT version. On a system with nodes of `NC` cores per node use 1 MPI task per node and `NC` threads per task, for example on a 20 cores/node system use 1 MPI process, set `OMP_NUM_THREADS` or any batch system related variable to 20. 

Set a  variable, for example `MYPPN` to `NC-1`,  for example to 19 for a 20 cores/node system.

You can also try other combinations of `TASKSPERNODE`/`THREADSPERTASK` to check.

The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems.

The general way to run is :

```
WRAPPER WRAPPER_OPTIONS PATH_TO_namd2 +ppn $MYPPN stmv.8M.memopt.namd > logfile
```

`WRAPPER` and `WRAPPER_OPTIONS`  depend on system, batch system etc.

Few common pairs are :
- CRAY     : aprun -n TASKS -N NODES -d THREADSPERTASK
- Curie    : ccc_mrun with no options - obtained from batch system
- Juqueen  : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
- Slurm    : srun  with no options, obtained from slurm if the variables below are set.

```
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
```


The run walltime is reported at the end of logfile : `grep WallClock: logfile  | awk -F ' ' '{print $2}'`



## [NAMD Build and Run instructions using CUDA, KNC offloading and KNL.](#cuda_knc_knl)

### CUDA Build instructions

In order to run benchmarks, the memopt build with SMP support is mandatory.
 
NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O. 
In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file. 

Since NAMD version 2.11, build scripts deny to compile with MPI for accelerators. Instead, the verbs interface is suggested.
You could overwrite this and use MPI instead the suggested verbs by commenting out the following lines in config script
```
if ( $charm_arch_mpi || ! $charm_arch_smp ) then
       echo ''
       echo "ERROR: $ERRTYPE builds require non-MPI SMP or multicore Charm++ arch for reasonable performance."
       echo ''
       echo "Consider ibverbs-smp or verbs-smp (InfiniBand), gni-smp (Cray), or multicore (single node)."
       echo ''
       exit 1
     endif
```

You need a NAMD 2.11 version or newer.

* Uncompress/tar the source.
* cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source )
* untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm.
* cd to charm-VERSION directory

#### configure and compile charm :

This step is system dependent. Some examples are :

Linux with Intel compilers  :  
```
./build charm++ verbs-linux-x86_64 smp icc --with-production -O -DCMK_OPTIMIZE
```
Linux with GNU   compilers  : 

```
./build charm++ verbs-linux-x86_64 smp gcc  --with-production -O -DCMK_OPTIMIZE
```

Help:

```
./build --help to see all available options.
```
 
For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) .

The syntax is : 
```
./build charm++ ARCHITECTURE  smp (compilers, optional)   --with-production -O -DCMK_OPTIMIZE
```
You can find a list of supported architectures/compilers in `charm-VERSION/src/arch`

The smp option is mandatory to build the Hybrid version of namd.
This builds charm++.

`cd ..`

#### Configure NAMD.

This step is system dependent. Some examples are :

Linux x86_64/AVX with Intel Compilers : 

```
./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch verbs-linux-x86_64-smp-icc  --with-fftw3 \
--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --with-cuda \
--cuda-prefix PATH_TO_CUDA_INSTALLATION_ROOT \
--charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "
```


Help      : 
```
./config --help to see all available options.
```

See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html)  for special notes on various systems.

What is absolutely necessary is the option : `--with-memopt, --with-cuda`  and an SMP enabled charm++ build.
It is suggested to disable tcl support as it is indicated by the `--without-tcl` flag, since tcl is not necessary
to run the benchmarks.

You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables. 
If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/) .

You may adjust the compilers and compiler flags as in the Linux x86_64/AVX example. 
A typical use of compilers/flags adjustement is for example to add `-xAVX` and keep all the other compiler flags of the architecture the same.
Take care or even just avoid using the `--cxx` option for NAMD config with no reason,  as this will override the compilation flags from the arch files in some cases.

When config ends prompts to change to a directory and run make. 

####	cd to the reported directory and run make 

If everything is ok you'll find the executable with name namd2
and the paraller wrapper called charmrun in this directory.




###  KNC/offloading  Build instructions

The build instruction for building namd binaries for offloading on KNC are similar to those of GPU with some modifications

The namd configure stage contains `--with-mic` instead of `--with-cuda`. The rest of options is the same
For example :
Linux x86_64/AVX with Intel Compilers : 
```
./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch verbs-linux-x86_64-smp-icc  --with-fftw3 \
					--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt \
					--with-mic \
					--charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
					--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "
```
What is absolutely necessary is the option : `--with-memopt`, `--with-mic`  and an SMP enabled charm++ build.


###  KNL Build instructions
For KNL follow the Build and Run Instructions in UEABS suite replacing the compiler flag `-xAVX` to `-xMIC-AVX512`


### Run Instructions

After build of NAMD you have the NAMD executable called namd2 and the parallel wrapper called charmrun.

The best performance and scaling of namd is usually achieved using hybrid MPI/SMP version. 
On a system with nodes of NC cores per node use 1 MPI task per node and NC threads per task,
for example on a 20 cores/node system use 1 MPI process,
set `OMP_NUM_THREADS` or any batch system related variable to 20. 

Set a  variable, for example PPN to NC-1, 
for example to 19 for a 20 cores/node system.

Since charmrun is used as parallel wrapper, one needs to specify the total number of tasks, threads per task and a hostfile in charmrun command line.

You can also try other combinations of `TASKSPERNODE/THREADSPERTASK` to check.

In order to use Accelerators, you need to specify the Accelerator devices in command line.
Typically one gets this information from the batch system. 
For example, in case of SLURM workload manager, this variable is `$SLURM_JOB_GPUS` for GPUS
or `$OFFLOAD_DEVICES` for KNC.

Typical values of these variables are : 0 or 1 in the case you request 1 accelerator per node,
or 0,1 in the case you request 2 accelerators per node etc.


The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems.

The general way to run the accelerated NAMD is :
```
charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices AcceleratorsIDS configfile > logfile
```

In the case of SLURM workload manager :
```
PPN=`expr $SLURM_CPUS_PER_TASK - 1`
P=`expr $SLURM_NNODES \* $PPN `
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do echo "host $n ++cpus $SLURM_NTASKS_PER_NODE" >> hostfile; done;
```
for GPUs
```
charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $SLURM_JOB_GPUS stmv.8M.memopt.namd > logfile
```
for KNC
```
charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $OFFLOAD_DEVICES configfile > logfile
```
The run walltime is reported at the end of logfile : `grep WallClock: logfile  | awk -F ' ' '{print $2}'`

Since version 2.13 one should also add `++mpiexec`in charmrun arguments.