Skip to content
README.md 7.25 KiB
Newer Older
Victor's avatar
Victor committed
# NAMD Build and Run instructions using CUDA, KNC offloading and KNL.

## CUDA Build instructions

In order to run benchmarks, the memopt build with SMP support is mandatory.
 
NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O. 
In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file. 

Since NAMD version 2.11, build scripts deny to compile with MPI for accelerators. Instead, the verbs interface is suggested.
You could overwrite this and use MPI instead the suggested verbs by commenting out the following lines in config script
```
if ( $charm_arch_mpi || ! $charm_arch_smp ) then
       echo ''
       echo "ERROR: $ERRTYPE builds require non-MPI SMP or multicore Charm++ arch for reasonable performance."
       echo ''
       echo "Consider ibverbs-smp or verbs-smp (InfiniBand), gni-smp (Cray), or multicore (single node)."
       echo ''
       exit 1
     endif
```

You need a NAMD 2.11 version or newer.

* Uncompress/tar the source.
* cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source )
* untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm.
* cd to charm-VERSION directory

### configure and compile charm :

This step is system dependent. Some examples are :

Linux with Intel compilers  :  
```
./build charm++ verbs-linux-x86_64 smp icc --with-production -O -DCMK_OPTIMIZE
```
Linux with GNU   compilers  : 

```
./build charm++ verbs-linux-x86_64 smp gcc  --with-production -O -DCMK_OPTIMIZE
```

Help:

```
./build --help to see all available options.
```
 
For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) .

The syntax is : 
```
./build charm++ ARCHITECTURE  smp (compilers, optional)   --with-production -O -DCMK_OPTIMIZE
```
You can find a list of supported architectures/compilers in `charm-VERSION/src/arch`

The smp option is mandatory to build the Hybrid version of namd.
This builds charm++.

`cd ..`

###	Configure NAMD.

This step is system dependent. Some examples are :

Linux x86_64/AVX with Intel Compilers : 

```
./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch verbs-linux-x86_64-smp-icc  --with-fftw3 \
--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --with-cuda \
--cuda-prefix PATH_TO_CUDA_INSTALLATION_ROOT \
--charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "
```


Help      : 
```
./config --help to see all available options.
```

See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html)  for special notes on various systems.

What is absolutely necessary is the option : `--with-memopt, --with-cuda`  and an SMP enabled charm++ build.
It is suggested to disable tcl support as it is indicated by the `--without-tcl` flag, since tcl is not necessary
to run the benchmarks.

You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables. 
If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/) .

You may adjust the compilers and compiler flags as in the Linux x86_64/AVX example. 
A typical use of compilers/flags adjustement is for example to add `-xAVX` and keep all the other compiler flags of the architecture the same.
Take care or even just avoid using the `--cxx` option for NAMD config with no reason,  as this will override the compilation flags from the arch files in some cases.

When config ends prompts to change to a directory and run make. 

###	cd to the reported directory and run make 

If everything is ok you'll find the executable with name namd2
and the paraller wrapper called charmrun in this directory.




##  KNC/offloading  Build instructions

The build instruction for building namd binaries for offloading on KNC are similar to those of GPU with some modifications

The namd configure stage contains `--with-mic` instead of `--with-cuda`. The rest of options is the same
For example :
Linux x86_64/AVX with Intel Compilers : 
```
./config Linux-x86_64-icc --charm-base ./charm-6.7.0  --charm-arch verbs-linux-x86_64-smp-icc  --with-fftw3 \
					--fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt \
					--with-mic \
					--charm-opts -verbose --cxx-opts "-O3 -xAVX "  --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \
					--cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec  "
```
What is absolutely necessary is the option : `--with-memopt`, `--with-mic`  and an SMP enabled charm++ build.


##  KNL Build instructions
For KNL follow the Build and Run Instructions in UEABS suite replacing the compiler flag `-xAVX` to `-xMIC-AVX512`


## Run Instructions

After build of NAMD you have the NAMD executable called namd2 and the parallel wrapper called charmrun.

The best performance and scaling of namd is usually achieved using hybrid MPI/SMP version. 
On a system with nodes of NC cores per node use 1 MPI task per node and NC threads per task,
for example on a 20 cores/node system use 1 MPI process,
set `OMP_NUM_THREADS` or any batch system related variable to 20. 

Set a  variable, for example PPN to NC-1, 
for example to 19 for a 20 cores/node system.

Since charmrun is used as parallel wrapper, one needs to specify the total number of tasks, threads per task and a hostfile in charmrun command line.

You can also try other combinations of `TASKSPERNODE/THREADSPERTASK` to check.

In order to use Accelerators, you need to specify the Accelerator devices in command line.
Typically one gets this information from the batch system. 
For example, in case of SLURM workload manager, this variable is `$SLURM_JOB_GPUS` for GPUS
or `$OFFLOAD_DEVICES` for KNC.

Typical values of these variables are : 0 or 1 in the case you request 1 accelerator per node,
or 0,1 in the case you request 2 accelerators per node etc.


The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems.

The general way to run the accelerated NAMD is :
```
charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices AcceleratorsIDS configfile > logfile
```

In the case of SLURM workload manager :
```
PPN=`expr $SLURM_CPUS_PER_TASK - 1`
P=`expr $SLURM_NNODES \* $PPN `
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do echo "host $n ++cpus $SLURM_NTASKS_PER_NODE" >> hostfile; done;
```
for GPUs
```
charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $SLURM_JOB_GPUS stmv.8M.memopt.namd > logfile
```
for KNC
```
charmrun  PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $OFFLOAD_DEVICES configfile > logfile
```
The run walltime is reported at the end of logfile : `grep WallClock: logfile  | awk -F ' ' '{print $2}'`