# Unified European Applications Benchmark Suite, version 1.1
# Unified European Applications Benchmark Suite, version 1.2
The Unified European Application Benchmark Suite (UEABS) is a set of 12 application codes taken from the pre-existing PRACE and DEISA application benchmark suites to form a single suite, with the objective of providing a set of scalable, currently relevant and publically available codes and datasets, of a size which can realistically be run on large systems, and maintained into the future. This work has been undertaken by Task 7.4 "Unified European Applications Benchmark Suite for Tier-0 and Tier-1" in the PRACE Second Implementation Phase (PRACE-2IP) project and will be updated and maintained by subsequent PRACE Implementation Phase projects.
For more details of the codes and datasets, and sample results, please see http://www.prace-ri.eu/IMG/pdf/d7.4_3ip.pdf
Release notes for version 1.2, released on 31st October 2016 as a result of PRACE-4IP activities.
Changes from version 1.1 are as follows:
GENE: new version of code and additional new dataset.
GPAW: new version of code and new dataset.
GROMACS: new version of code and updated dataset.
NAMD: new version of code and minor build and run instructions updates.
NEMO: new version of code and replaced dataset.
Release notes for version 1.1, released on 31st May 2014 as a resut of PRACE-3IP activities.
Changes from version 1.0 are as follows:
...
...
@@ -25,7 +34,7 @@ The codes composing the UEABS are:
This benchmark set contains a short functional test as well as scaling
tests for electronic structure simulation software GPAW. More information on
GPAW can be found at wiki.fysik.dtu.dk/gpaw
Functional test: functional.py
==============================
A calculation for the ground state electronic structure of small Si cluster
followed by linear response time-dependent density-functional theory
calculation. This test works with 8-64 CPU cores.
Medium scaling test: Si_gs.py
=============================
A ground state calculation (few iterations) for spherical Si
This test should scale to ~2000 processor cores in x86 architecture.
Total running time with ~2000 cores is ~7 min. In principle, arbitrary
number of CPU cores can be used, but recommended values are powers of 2.
This test produces a 47 GB output file Si_gs.hdf5 to be used for the
large scaling test Si_lr1.py.
For scalability testing the relevant timer in the text output
'out_Si_gs_pXXXX.txt' (where XXXX is the CPU core count) is 'SCF-cycle'.
The parallel I/O performance (with HDF5) can be benchmarked with the
'IO' timer.
Large scaling test: Si_lr1.py
=============================
Linear response TDDFT calculation for spherical Si cluster
This benchmark should be scalable to ~20 000 CPU cores
in x86 architecture. Number of CPU cores has to be a multiple of 256.
Estimated total running time with 20 000 CPU cores is 5 min.
The benchmark requires as input the ground state wave functions in
the file Si_gs.hdf5 which can be produced by the ground state benchmark
Si_gs.py
The relevant timer for this benchmark is 'Calculate K matrix',
timing information is written to a text output file Si_lr1_pxxxx.txt where
xxxx is the number of CPU cores.
Optional large scaling test: Au38_lr.py
This benchmark set contains scaling tests for electronic structure simulation software GPAW.
More information on GPAW can be found at https://wiki.fysik.dtu.dk/gpaw
Small Scaling Test: carbone_nanotube.py
=======================================
Linear response TDDFT calculation for Au38 cluster surrounded by CH3
ligands. This benchmark should be scalable to ~20 000 CPU cores
in x86 architecture. Number of CPU cores has to be a multiple of 256.
Estimated running with 20 000 CPU cores is 5 min.
The benchmark requires as input the ground state wave functions
which can be produced by input Au38_gs.py (about 5 min calculation with
64 cores).
The relevant timer for this benchmark is 'Calculate K matrix',
timing information is written to a text output file Au38_lr_pxxxx.txt where
xxxx is the number of CPU cores.
A ground state calculation for (6-6-10) carbon nanotube, requiring 30 SCF iterations.
The calculations under ScaLAPACK are parallelized under 4/4/64 partitioning scheme.
This systems scales reasonably up to 512 cores, running to completion under two minutes on a 2015 era x86 architecture cluster.
For scalability testing, the relevant timer in the text output 'out_nanotube_hXXX_kYYY_pZZZ' (where XXX denotes grid spacing, YYY denotes Brillouin-zone sampling and ZZZ denotes number of cores utilized) is 'Total Time'.
Medium Scaling Test: C60_Pb100.py and C60_Pb100_POSCAR
A ground state calculation for Fullerene on Pb 100 Surface, requiring ~100 SCF iterations.
In this example, the parameters of the parallelization scheme for ScaLAPACK calculations are chosen automatically (using the keyword 'sl_auto: True').
This systems scales reasonably up to 1024 cores, running to completion under thirteen minutes on a 2015 era x86 architecture cluster.
For scalability testing, the relevant timer in the text output 'out_C60_Pb100_hXXX_kYYY_pZZZ' (where XXX denotes grid spacing, YYY denotes Brillouin-zone sampling and ZZZ denotes number of cores utilized) is 'Total Time'.
How to run
==========
...
...
@@ -57,7 +27,6 @@ How to run
* Benchmarks do not need any special command line options and can be run
You need to specify the fftw3 installation directory. On systems that
use environment modules you need to load the existing fftw3 module
and probably use the provided environment variables - like in CRAY-XE6
example above.
If fftw3 libraries are not installed on your system,
download and install fftw-3.3.3.tar.gz from http://www.fftw.org/.
You may adjust the compilers and compiler flags as the CURIE example.
When config ends prompts to change to a directory and run make.
8. cd to the reported directory and run make
If everything is ok you'll find the executable with name namd2 in this
directory.
Build instructions for namd.
In order to run benchmarks the memopt build with SMP support is mandatory.
NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O.
In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file.
In order to build this version, your MPI need to have level of thread support: MPI_THREAD_FUNNELED
You need a NAMD 2.11 version or newer.
1. Uncompress/tar the source.
2. cd NAMD_Source_BASE (the directory name depends on how the source obtained,
typically : namd2 or NAMD_2.11_Source )
3. untar the charm-VERSION.tar that exists. If you obtained the namd source via
cvs, you need to download separately charm.
4. cd to charm-VERSION directory
5. configure and compile charm :
This step is system dependent. Some examples are :
Written in 2013-09-12, as a product of NEMO benchmark in PRACE 2IP-WP7.4.
Written by Soon-Heum "Jeff" Ko at Linkoping University, Sweden (sko@nsc.liu.se).
0. Before start
- Download two tarball files (src and input) from PRACE benchmark site.
- Create a directory, 'ORCA12_PRACE' and untar above-mentioned files under that directory. Then the directory structure would be as
----- ORCA12_PRACE
|
|------ DATA_CONFIG_ORCA12/
|
|------ FORCING/
|
|------ NEMOGCM/
|
|------ README
|
|------ Instruction_for_JuBE.tar.gz
1. Build-up of standalone version
- You can find the easy how-to from ORCA12_PRACE/README file, which is an instruction document written after PRACE 1IP contribution. To repeat the instruction,
1) cd NEMOGCM/ARCH
2) create a arch-COMPUTER.fcm file in NEMOGCM/ARCH corresponding to your needs. You can refer to 'arch-ifort_linux_curie.fcm' which is tuned for CURIE x86_64 system.
3) cd NEMOGCM/CONFIG
4) ./makenemo -n ORCA12.L75-PRACE -m COMPUTER
Then you will have a subdirectory 'ORCA12.L75-PRACE' is created.
2. Build-up under JuBE benchmark framework
- You shall first download JuBE benchmark suite and PRACE benchmark applications from PRACE SVN. Then you will find 'nemo' benchmark under PABS/applications. Because the old nemo benchmark set has been ill-written and there have been changes on NEMO source, we provide the benchmark setup for the current NEMO version in a separate tarball file (Instruction_for_JuBE.tar.gz). You can follow the instruction specified there for installing and running NEMO v3.4. in the JuBE benchmark suite.
Written in 2013-09-12, as a product of NEMO benchmark in PRACE 2IP-WP7.4.
Written by Soon-Heum "Jeff" Ko at Linkoping University, Sweden (sko@nsc.liu.se).
0. Before start
- Follow the instriction in 'NEMO_Build_README.txt' so that you have the directory structure as specified, along with compiled binary:
----- ORCA12_PRACE
|
|------ DATA_CONFIG_ORCA12/
|
|------ FORCING/
|
|------ NEMOGCM/
|
|------ README
|
|------ Instruction_for_JuBE.tar.gz
1. Running standalone version
- After compilation, you will have 'ORCA12.L75-PRACE' directory created under NEMOGCM/CONFIG.
1) cd ORCA12.L75-PRACE/EXP00
2) Link to datasets. Perform follows:
$ ln -s ../../../../DATA_CONFIG_ORCA12/* .
$ ln -s ../../../../FORCING/* .
3) Locate 'namelist' and 'namelist_ice' files in this directory and edit them
4) Run it. It does not have any special command line arguments, thus you can simply type 'mpirun opa'.
2. Running under JuBE benchmark framework
- You can prepare your own XML file to complete from compiling to running at the same time. A file 'ORCA_PRACE_CURIE.xml' under Instruction_for_JuBE.tar.gz could be used as an example. One remark for CURIE user: you shall specify your project ID and which type of queues (standard; large; ...) you are to use. That information is found from 'ccc_myproject' command.