## Overview ## [Download](#download) The official site to download namd is : [http://www.ks.uiuc.edu/Research/namd/](http://www.ks.uiuc.edu/Research/namd/) You need to register for free here to get a namd copy from here : [http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD](http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD) In order to get a specific CVS snapshot, you need first to ask for, - username/password : [http://www.ks.uiuc.edu/Research/namd/cvsrequest.html](http://www.ks.uiuc.edu/Research/namd/cvsrequest.html) - When your cvs access application is approved, you can use your username/password to download a specific cvs snapshot : `cvs  -d :pserver:username@cvs.ks.uiuc.edu:/namd/cvsroot co -D "2013-02-06 23:59:00 GMT" namd2` In this case, the `charm++` is not included. You have to download separately and put it in the namd2 source tree : [http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz](http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz) ## [Build and Run](#build_run) ### [Build](#build) Build instructions for namd. In order to run benchmarks the memopt build with SMP support is mandatory. NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O. In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file. In order to build this version, your MPI need to have level of thread support: `MPI_THREAD_FUNNELED` You need a NAMD 2.11 version or newer. 1. Uncompress/tar the source. 2. cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source ) 3. untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm. 4. cd to charm-VERSION directory 5. configure and compile charm : This step is system dependent. Some examples are : - CRAY XE6 : `./build charm++ mpi-crayxe smp --with-production -O -DCMK_OPTIMIZE` - CURIE : `./build charm++ mpi-linux-x86_64 smp mpicxx ifort --with-production -O -DCMK_OPTIMIZE` - JUQUEEN : `./build charm++ mpi-bluegeneq smp xlc --with-production -O -DCMK_OPTIMIZE` - Help : `./build --help` to see all available options. For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html). The syntax is : `./build charm++ ARCHITECTURE smp (compilers, optional) --with-production -O -DCMK_OPTIMIZE` You can find a list of supported architectures/compilers in `charm-VERSION/src/arch` The smp option is mandatory to build the Hybrid version of namd. This builds `charm++`. 6. cd .. 7. Configure NAMD. This step is system dependent. Some examples are : - CRAY-XE6: `./config CRAY-XT-g++ --charm-base ./charm-6.7.0 --charm-arch mpi-crayxe-smp --with-fftw3 --fftw-prefix $CRAY_FFTW_DIR --without-tcl --with-memopt --charm-opts -verbose` - CURIE: `./config Linux-x86_64-icc --charm-base ./charm-6.7.0 --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec "` - Juqueen: `./config BlueGeneQ-MPI-xlC --charm-base ./charm-6.7.0 --charm-arch mpi-bluegeneq-smp-xlc --with-fftw3 --with-fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --charm-opts -verbose --with-memopt` - Help: `./config --help` to see all available options. See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) for special notes on various systems. What is absolutely necessary is the option : `--with-memopt` and an SMP enabled charm++ build. It is suggested to disable tcl support as it is indicated by the `--without-tcl` flags, since tcl is not necessary to run the benchmarks. You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables - like in CRAY-XE6 example above. If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/). You may adjust the compilers and compiler flags as the CURIE example. A typical use of compilers/flags adjustement is for example to add `-xAVX` in the CURIE case and keep all the other compiler flags of the architecture the same. Take care or even just avoid using the `--cxx` option for NAMD config with no reason, as this will override the compilation flags from the arch file. When config ends prompts to change to a directory and run make. 8. cd to the reported directory and run `make` If everything is ok you'll find the executable with name `namd2` in this directory. ### [Run](#run) Run instructions for NAMD. ntell@grnet.gr After build of NAMD you have an executable called `namd2`. The best performance and scaling of namd is achieved using hybrid MPI/MT version. On a system with nodes of `NC` cores per node use 1 MPI task per node and `NC` threads per task, for example on a 20 cores/node system use 1 MPI process, set `OMP_NUM_THREADS` or any batch system related variable to 20. Set a variable, for example `MYPPN` to `NC-1`, for example to 19 for a 20 cores/node system. You can also try other combinations of `TASKSPERNODE`/`THREADSPERTASK` to check. The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems. The general way to run is : ``` WRAPPER WRAPPER_OPTIONS PATH_TO_namd2 +ppn $MYPPN stmv.8M.memopt.namd > logfile ``` `WRAPPER` and `WRAPPER_OPTIONS` depend on system, batch system etc. Few common pairs are : - CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK - Curie : ccc_mrun with no options - obtained from batch system - Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS - Slurm : srun with no options, obtained from slurm if the variables below are set. ``` #SBATCH --nodes=NODES #SBATCH --ntasks-per-node=TASKSPERNODE #SBATCH --cpus-per-task=THREADSPERTASK ``` The run walltime is reported at the end of logfile : `grep WallClock: logfile | awk -F ' ' '{print $2}'` ## [NAMD Build and Run instructions using CUDA, KNC offloading and KNL.](#cuda_knc_knl) ### CUDA Build instructions In order to run benchmarks, the memopt build with SMP support is mandatory. NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O. In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file. Since NAMD version 2.11, build scripts deny to compile with MPI for accelerators. Instead, the verbs interface is suggested. You could overwrite this and use MPI instead the suggested verbs by commenting out the following lines in config script ``` if ( $charm_arch_mpi || ! $charm_arch_smp ) then echo '' echo "ERROR: $ERRTYPE builds require non-MPI SMP or multicore Charm++ arch for reasonable performance." echo '' echo "Consider ibverbs-smp or verbs-smp (InfiniBand), gni-smp (Cray), or multicore (single node)." echo '' exit 1 endif ``` You need a NAMD 2.11 version or newer. * Uncompress/tar the source. * cd NAMD_Source_BASE (the directory name depends on how the source obtained, typically : namd2 or NAMD_2.11_Source ) * untar the charm-VERSION.tar that exists. If you obtained the namd source via cvs, you need to download separately charm. * cd to charm-VERSION directory #### configure and compile charm : This step is system dependent. Some examples are : Linux with Intel compilers : ``` ./build charm++ verbs-linux-x86_64 smp icc --with-production -O -DCMK_OPTIMIZE ``` Linux with GNU compilers : ``` ./build charm++ verbs-linux-x86_64 smp gcc --with-production -O -DCMK_OPTIMIZE ``` Help: ``` ./build --help to see all available options. ``` For special notes on various systems, you should look in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) . The syntax is : ``` ./build charm++ ARCHITECTURE smp (compilers, optional) --with-production -O -DCMK_OPTIMIZE ``` You can find a list of supported architectures/compilers in `charm-VERSION/src/arch` The smp option is mandatory to build the Hybrid version of namd. This builds charm++. `cd ..` #### Configure NAMD. This step is system dependent. Some examples are : Linux x86_64/AVX with Intel Compilers : ``` ./config Linux-x86_64-icc --charm-base ./charm-6.7.0 --charm-arch verbs-linux-x86_64-smp-icc --with-fftw3 \ --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --with-cuda \ --cuda-prefix PATH_TO_CUDA_INSTALLATION_ROOT \ --charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \ --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec " ``` Help : ``` ./config --help to see all available options. ``` See in [http://www.ks.uiuc.edu/Research/namd/2.11/notes.html](http://www.ks.uiuc.edu/Research/namd/2.11/notes.html) for special notes on various systems. What is absolutely necessary is the option : `--with-memopt, --with-cuda` and an SMP enabled charm++ build. It is suggested to disable tcl support as it is indicated by the `--without-tcl` flag, since tcl is not necessary to run the benchmarks. You need to specify the fftw3 installation directory. On systems that use environment modules you need to load the existing fftw3 module and probably use the provided environment variables. If fftw3 libraries are not installed on your system, download and install fftw-3.3.5.tar.gz from [http://www.fftw.org/](http://www.fftw.org/) . You may adjust the compilers and compiler flags as in the Linux x86_64/AVX example. A typical use of compilers/flags adjustement is for example to add `-xAVX` and keep all the other compiler flags of the architecture the same. Take care or even just avoid using the `--cxx` option for NAMD config with no reason, as this will override the compilation flags from the arch files in some cases. When config ends prompts to change to a directory and run make. #### cd to the reported directory and run make If everything is ok you'll find the executable with name namd2 and the paraller wrapper called charmrun in this directory. ### KNC/offloading Build instructions The build instruction for building namd binaries for offloading on KNC are similar to those of GPU with some modifications The namd configure stage contains `--with-mic` instead of `--with-cuda`. The rest of options is the same For example : Linux x86_64/AVX with Intel Compilers : ``` ./config Linux-x86_64-icc --charm-base ./charm-6.7.0 --charm-arch verbs-linux-x86_64-smp-icc --with-fftw3 \ --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt \ --with-mic \ --charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc \ --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec " ``` What is absolutely necessary is the option : `--with-memopt`, `--with-mic` and an SMP enabled charm++ build. ### KNL Build instructions For KNL follow the Build and Run Instructions in UEABS suite replacing the compiler flag `-xAVX` to `-xMIC-AVX512` ### Run Instructions After build of NAMD you have the NAMD executable called namd2 and the parallel wrapper called charmrun. The best performance and scaling of namd is usually achieved using hybrid MPI/SMP version. On a system with nodes of NC cores per node use 1 MPI task per node and NC threads per task, for example on a 20 cores/node system use 1 MPI process, set `OMP_NUM_THREADS` or any batch system related variable to 20. Set a variable, for example PPN to NC-1, for example to 19 for a 20 cores/node system. Since charmrun is used as parallel wrapper, one needs to specify the total number of tasks, threads per task and a hostfile in charmrun command line. You can also try other combinations of `TASKSPERNODE/THREADSPERTASK` to check. In order to use Accelerators, you need to specify the Accelerator devices in command line. Typically one gets this information from the batch system. For example, in case of SLURM workload manager, this variable is `$SLURM_JOB_GPUS` for GPUS or `$OFFLOAD_DEVICES` for KNC. Typical values of these variables are : 0 or 1 in the case you request 1 accelerator per node, or 0,1 in the case you request 2 accelerators per node etc. The control file is `stmv.8M.memopt.namd` for tier-1 and `stmv.28M.memopt.namd` for tier-0 systems. The general way to run the accelerated NAMD is : ``` charmrun PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices AcceleratorsIDS configfile > logfile ``` In the case of SLURM workload manager : ``` PPN=`expr $SLURM_CPUS_PER_TASK - 1` P=`expr $SLURM_NNODES \* $PPN ` for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do echo "host $n ++cpus $SLURM_NTASKS_PER_NODE" >> hostfile; done; ``` for GPUs ``` charmrun PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $SLURM_JOB_GPUS stmv.8M.memopt.namd > logfile ``` for KNC ``` charmrun PATH_TO_namd2 ++p $P ++ppn $PPN ++nodelist ./hostfile +devices $OFFLOAD_DEVICES configfile > logfile ``` The run walltime is reported at the end of logfile : `grep WallClock: logfile | awk -F ' ' '{print $2}'` Since version 2.13 one should also add `++mpiexec`in charmrun arguments.