Commit 924c60c4 authored by Valeriu Codreanu's avatar Valeriu Codreanu
Browse files

Version 1.0

parents
In order to build ALYA (Alya.x), please follow these steps:
- Go to: Thirdparties/metis-4.0 and build the Metis library (libmetis.a) using 'make'
- Go to the directory: Executables/unix
- Adapt the file: configure-marenostrum-mpi.txt to your own MPI wrappers and paths
- Execute:
./configure -x -f=configure-marenostrum-mpi.txt nastin parall
make
In order to run ALYA, you need at least the following input files per execution:
X.dom.dat
X.typ.dat
X.geo.dat
X.bcs.dat
X.inflow_profile.bcs
X.ker.dat
X.nsi.dat
X.dat
In our case, there are 2 different inputs, so X={1_1p1mill,3_27p3mill}
To execute a simulation, you must be inside the input directory and you should submit a job like:
mpirun Alya.x 1_1p1mill
or
mpirun Alya.x 3_27p3mill
Installation:
-------------
Code_Saturne is open source and the documentation about how to install
it is to be found under http://www.code-saturne.org
However, the version 3.0.1 has been copied to the current folder.
Running - Test case:
----------------
Running a case is described in the following page: http://www.code-saturne.org
The test case deals with the flow in a bundle of tubes.
A larger mesh (51M cells) is built from an original mesh of 13M cells.
The original mesh_input file (already preprocessed for Code_Saturne)
is to be found under MESH.
The user subroutines are under XE6_INTERLAGOS/SRC
The test case has been set up to run for 10 time-steps.
Contact:
--------
If you have any question, please contact Charles Moulinec (STFC Daresbury Laboratory)
at charles.moulinec@stfc.ac.uk
Build instructions for CP2K.
2014-04-09 : ntell@iasa.gr
CP2K needs a number of external libraries and a threads enabled MPI implementation.
These are : BLAS/LAPACK, BLACS/SCALAPACK, LIBINT, FFTW3.
It is advised to use the vendor optimized versions of these libraries.
If some of these are not available on your machine,
there some implementations of these libraries. Some of these are below.
1. BLAS/LAPACK :
netlib BLAS/LAPACK : http://netlib.org/lapack/
ATLAS : http://math-atlas.sf.net/
GotoBLAS : http://www.tacc.utexas.edu/tacc-projects
MKL : refer to your Intel MKL installation, if available
ACML : refer to your ACML installation if available
2. BLACS/SCALAPACK : http://netlib.org/scalapack/
Intel BLACS/SCALAPACK Implementation
3. LIBINT : http://sourceforge.net/projects/libint/files/v1-releases/
4. FFTW3 : http://www.fftw.org/
In the directory cp2k-VERSION/arch there are some ARCH files with instructions how
to build CP2K. For each architecture/compiler there are few arch files describing how to build cp2k.
Select one of the .psmp files that fits your architecture/compiler.
cd to cp2k-VERSION/makefiles
If the arch file for your machine is called SOMEARCH_SOMECOMPILER.psmp,
issue : make ARCH=SOMEARCH_SOMECOMPILER VERSION=psmp
If everything goes fine, you'll find the executable cp2k.psmp in the directory
cp2k-VERSION/exe/SOMEARCH_SOMECOMPILER
In most cases you need to create a custom arch file that fits cpu type,
compiler, and the installation path of external libraries.
As an example below is the arch file for a machine with mpif90/gcc/gfortran, that supports SSE2, has
all the external libraries installed under /usr/local/, uses ATLAS with full
LAPACK support for BLAS/LAPACK, Scalapack-2 for BLACS/Scalapack, fftw3 FFTW3 and libint-1.1.4:
#=======================================================================================================
CC = gcc
CPP =
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DFLAGS = -D__GFORTRAN -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__FFTW3 -D__LIBINT -I/usr/local/fftw3/include -I/usr/local/libint-1.1.4/include
CPPFLAGS =
FCFLAGS = $(DFLAGS) -O3 -msse2 -funroll-loops -finline -ffree-form
FCFLAGS2 = $(DFLAGS) -O3 -msse2 -funroll-loops -finline -ffree-form
LDFLAGS = $(FCFLAGS)
LIBS = /usr/local/Scalapack/lib/libscalapack.a \
/usr/local/Atlas/lib/liblapack.a \
/usr/local/Atlas/lib/libf77blas.a \
/usr/local/Atlas/lib/libcblas.a \
/usr/local/Atlas/lib/libatlas.a \
/usr/local/fftw3/lib/libfftw3_threads.a \
/usr/local/fftw3/lib/libfftw3.a \
/usr/local/libint-1.1.4/lib/libderiv.a \
/usr/local/libint-1.1.4/lib/libint.a \
-lstdc++ -lpthread
OBJECTS_ARCHITECTURE = machine_gfortran.o
#=======================================================================================================
CP2K can be downloaded from : http://www.cp2k.org/download
It is free for all users under GPL license,
see Obtaining CP2K section in the download page.
In UEABS(2IP) the 2.3 branch was used that can be downloaded from :
http://sourceforge.net/projects/cp2k/files/cp2k-2.3.tar.bz2
Data files are compatible with at least 2.4 branch.
Tier-0 data set requires the libint-1.1.4 library. If libint version 1
is not available on your machine, libint can be downloaded from :
http://sourceforge.net/projects/libint/files/v1-releases/libint-1.1.4.tar.gz
Run instructions for CP2K.
2013-08-13 : ntell@iasa.gr
After build of hybrid MPI/OMP CP2K you have an executable called cp2k.psmp.
You can try any combination of TASKSPERNODE/THREADSPERTASK.
The input file is H2O-1024.inp for tier-1 and input_bulk_HFX_3.inp for tier-0 systems.
For tier-1 systems the best performance is usually obtained with pure MPI,
while for tier-0 systems the best performance is obtained using 1 MPI task per
node with the number of threads/MPI_Task being equal to the number of
cores/node.
Tier-0 case requires a converged wavefunction file, that can be obtained
running with any number of cores, 1024-2048 cores are suggested :
WRAPPER WRAPPER_OPTIONS PATH_TO_cp2k.psmp -i input_bulk_B88_3.inp -o input_bulk_B88_3.log
When this run finish, mv the saved restart file LiH_bulk_3-RESTART.wfn to
B88.wfn
The general way to run is :
WRAPPER WRAPPER_OPTIONS PATH_TO_cp2k.psmp -i inputfile -o logile
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The run walltime is reported near the end of logfile : grep ^\ CP2K\^ logile | tail -1 | awk -F ' ' '{print $7}'
1. Install FFTW-2, available at http://www.fftw.org
2. Install GSL, availavle at http://www.gnu.org/software/gsl
3. Install HDF5, availavle at http://www.hdfgroup.org/HDF5/
4. Go to Gadget3/
5. Edit Makefile, set:
CC
CXX
GSL_INCL
GSL_LIBS
FFTW_INCL
FFTW_LIBS
HDF5INCL
HDF5LIB
6. make CONFIG=Config-Medium.sh
\ No newline at end of file
1. Creation of input
mpirun -np 128 ./N-GenIC ics_medium.param
ics_medium.param is in N-GenIC directory
2. Run calculation
mpirun -np 128 ./Gadget3 param-medium.txt
param-medium.txt is in Gadget3 directory
This is the README file for the GENE application benchmark,
distributed with the Unified European Application Benchmark Suite.
-----------
GENE readme
-----------
Contents
--------
1. General description
2. Code structure
3. Parallelization
4. Building
5. Execution
6. Data
1. General description
======================
The gyrokinetic plasma turbulence code GENE (this acronym stands for
Gyrokinetic Electromagnetic Numerical Experiment) is a software package
dedicated to solving the nonlinear gyrokinetic Integro-Differential system
of equations in either flux-tube domain or in a radially nonlocal domain.
GENE has been developed by a team of people (the Gene Development Team,
led by F. Jenko, Max-Planck-Institut for Plasma Physics) over the last
several years.
For further documentation of the code see: http://www.ipp.mpg.de/~fsj/gene/
2. Code structure
==================
Each particle species is described by a time-dependent distribution function
in a five-dimensional phase space.
This results in 6 dimensional arrays, which have the following coordinates:
x y z three space coordinates
v parallel velocity
w perpendicular velocity
spec species of particles
GENE is written completely in FORTRAN90, with some language structures
from Fortran 2003 standard. It also contains preprocessing directives.
3. Parallelization
==================
Parallelization is done by domain decomposition of all 6 coordinates using MPI.
x, y, z 3 space coordinates
v parallel velocity
w perpendicular velocity
spec species of particles
4. Building
===========
The source code (fortran-90) resides in directory src.
The compilation of GENE will be done by JuBE.
Compilation will be done automatically if a new executable for the
benchmark runs is needed.
5. Running the code
====================
A very brief description of the datasets:
parameters_small
A small data set for test purposes. Needs only 8 cores to run.
parameters_tier1
Global simulation of ion-scale turbulence in Asdex-Upgrade,
needs 200-500GB total memory, runs from 256 to 4096 cores
parameters_tier0
Global simulation of ion-scale turbulence in JET,
needs 3.5-7TB total memory, runs from 4096 to 16384 cores
For running the benchmark for GENE, please follow the instructions for
using JuBE.
JuBE generates for each benchmark run a run directory and generates from
a template input file the input file 'parameters' and stores it in the
run directory.
A job submit script is created as well and is submitted.
6. Data
=======
The only input file is 'parameters'. It has the format of a f90 namelist.
The following output files are stored in the run directory.
nrg.dat The content of this file is used to verify the correctness
of the benchmark run.
stdout is redirected by JuBE.
It contains logging information,
especially the result of the time measurement.
--------------------------------------------------------------------------
Instructions for obtaining GPAW and its test set for PRACE benchmarking
GPAW is licensed under GPL, so there are no license issues
Software requirements
=====================
* MPI
* BLAS, LAPACK, Scalapack
* HDF5
* Python (2.x series from 2.4 upwards)
* For very large calculations ( > 4000 CPU cores) it is recommended to
to use the special Python interpreter which reduces the
initialization time related to Python's import mechanism:
https://gitorious.org/scalable-python
* NumPy ( > 1.3)
Obtaining the source code
=========================
* This benchmark uses the 3.6.1.3356 version of Atomic Simulation Environment
(ASE), which can be obtained as follows:
svn co -r 3356 https://svn.fysik.dtu.dk/projects/ase/trunk ase
* This benchmark uses the 0.9.10710 version of GPAW, which can be
obtained as follows:
svn co -r 10710 https://svn.fysik.dtu.dk/projects/gpaw/trunk gpaw
* Installation instructions for various architectures are given in
https://wiki.fysik.dtu.dk/gpaw/install/platforms_and_architectures.html
Support
=======
* Help regarding the benchmark can be requested from jussi.enkovaara@csc.fi
This benchmark set contains a short functional test as well as scaling
tests for electronic structure simulation software GPAW. More information on
GPAW can be found at wiki.fysik.dtu.dk/gpaw
Functional test: functional.py
==============================
A calculation for the ground state electronic structure of small Si cluster
followed by linear response time-dependent density-functional theory
calculation. This test works with 8-64 CPU cores.
Medium scaling test: Si_gs.py
=============================
A ground state calculation (few iterations) for spherical Si
This test should scale to ~2000 processor cores in x86 architecture.
Total running time with ~2000 cores is ~7 min. In principle, arbitrary
number of CPU cores can be used, but recommended values are powers of 2.
This test produces a 47 GB output file Si_gs.hdf5 to be used for the
large scaling test Si_lr1.py.
For scalability testing the relevant timer in the text output
'out_Si_gs_pXXXX.txt' (where XXXX is the CPU core count) is 'SCF-cycle'.
The parallel I/O performance (with HDF5) can be benchmarked with the
'IO' timer.
Large scaling test: Si_lr1.py
=============================
Linear response TDDFT calculation for spherical Si cluster
This benchmark should be scalable to ~20 000 CPU cores
in x86 architecture. Number of CPU cores has to be a multiple of 256.
Estimated total running time with 20 000 CPU cores is 5 min.
The benchmark requires as input the ground state wave functions in
the file Si_gs.hdf5 which can be produced by the ground state benchmark
Si_gs.py
The relevant timer for this benchmark is 'Calculate K matrix',
timing information is written to a text output file Si_lr1_pxxxx.txt where
xxxx is the number of CPU cores.
Optional large scaling test: Au38_lr.py
=======================================
Linear response TDDFT calculation for Au38 cluster surrounded by CH3
ligands. This benchmark should be scalable to ~20 000 CPU cores
in x86 architecture. Number of CPU cores has to be a multiple of 256.
Estimated running with 20 000 CPU cores is 5 min.
The benchmark requires as input the ground state wave functions
which can be produced by input Au38_gs.py (about 5 min calculation with
64 cores).
The relevant timer for this benchmark is 'Calculate K matrix',
timing information is written to a text output file Au38_lr_pxxxx.txt where
xxxx is the number of CPU cores.
How to run
==========
* Download and build the source code along the instructions in GPAW_Build_README.txt
* Benchmarks do not need any special command line options and can be run
just as e.g. :
mpirun -np 64 gpaw-python functional.py
mpirun -np 1024 gpaw-python Si_gs.py
mpirun -np 16384 gpaw-python Si_lr1.py
Gromacs can be downloaded from : http://www.gromacs.org/Downloads
The UEABS benchmark cases require the use of 4.6 or newer branch,
the latest 4.6.x version is suggested.
There are two data sets in UEABS for Gromacs.
1. ion_channel that use PME for electrostatics, for Tier-1 systems
2. lignocellulose-rf that use Reaction field for electrostatics, for Tier-0 systems. Reference : http://pubs.acs.org/doi/abs/10.1021/bm400442n
The input data file for each benchmark is the corresponding .tpr file produced using
tools from a complete gromacs installation and a series of ascii data files
(atom coords/velocities, forcefield, run control).
If it happens to run the tier-0 case on BG/Q use lignucellulose-rf.BGQ.tpr
instead lignocellulose-rf.tpr. It is the same as lignocellulose-rf.tpr
created on a BG/Q system.
The general way to run gromacs benchmarks is :
WRAPPER WRAPPER_OPTIONS PATH_TO_MDRUN -s CASENAME.tpr -maxh 0.50 -resethway -noconfout -nsteps 10000 -g logile
CASENAME is one of ion_channel or lignocellulose-rf
maxh : Terminate after 0.99 times this time (hours) i.e. gracefully terminate after ~30 min
resethwat : Reset Timer counters at half steps. This means that the reported
walltime and performance referes to the last
half steps of sumulation.
noconfout : Do not save output coordinates/velocities at the end.
nsteps : Run this number of steps, no matter what is requested in the input file
logfile : The output filename. If extension .log is ommited
it is automatically appended. Obviously, it should be different
for different runs.
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The best performance is usually obtained using pure MPI i.e. THREADSPERTASK=1.
You can check other hybrid MPI/OMP combinations.
The execution time is reported at the end of logfile : grep Time: logfile | awk -F ' ' '{print $3}'
NOTE : This is the wall time for the last half number of steps.
For sufficiently large nsteps, this is half of the total wall time.
Build instructions for namd.
In benchmarks the memopt version is used with SMP support.
In order to build this version, your MPI need to have level of thread support: MPI_THREAD_FUNNELED
You need a NAMD CVS 2.9 version 2013-02-06 or later.
1. Uncompress/tar the source.
2. cd NAMD_Source_BASE (the directory name depends on how the source obtained,
typically : namd2 or NAMD_CVS_2013-02-06_Source )
3. untar the charm-VERSION.tar that exists. If you obtained the namd source via
cvs, you need to download separately charm.
4. cd to charm-VERSION directory
5. configure and compile charm :
This step is system dependent. Some examples are :
CRAY XE6 : ./build charm++ mpi-crayxe smp --with-production -O -DCMK_OPTIMIZE
CURIE : ./build charm++ mpi-linux-x86_64 smp mpicxx ifort --with-production -O -DCMK_OPTIMIZE
JUQUEEN : ./build charm++ mpi-bluegeneq smp xlc --with-production -O -DCMK_OPTIMIZE
The syntax is : ./build charm++ ARCHITECTURE smp (compilers, optional) --with-production -O -DCMK_OPTIMIZE
You can find a list of supported architectures/compilers in charm-VERSION/src/arch
The smp option is mandatory to build the Hybrid version of namd.
This builds charm++.
6. cd ..
7. Configure NAMD.
This step is system dependent. Some examples are :
CRAY-XE6 ./config CRAY-XT-g++ --charm-base ./charm-6.5.0 --charm-arch mpi-crayxe-smp --with-fftw3 --fftw-prefix $CRAY_FFTW_DIR --without-tcl --with-memopt --charm-opts -verbose
CURIE ./config Linux-x86_64-icc --charm-base ./charm-6.5.0 --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec "
Juqueen: ./config BlueGeneQ-MPI-xlC --charm-base ./charm-6.5.0 --charm-arch mpi-bluegeneq-smp-xlc --with-fftw3 --with-fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --charm-opts -verbose --with-memopt
You need to specify the fftw3 installation directory. On systems that
use environment modules you need to load the existing fftw3 module
and probably use the provided environment variables - like in CRAY-XE6
example above.
If fftw3 libraries are not installed on your system,
download and install fftw-3.3.3.tar.gz from http://www.fftw.org/.
You may adjust the compilers and compiler flags as the CURIE example.
When config ends prompts to change to a directory and run make.
8. cd to the reported directory and run make
If everything is ok you'll find the executable with name namd2 in this
directory.
The official site to download namd is :
http://www.ks.uiuc.edu/Research/namd/
You need to register for free here to get a namd copy from here :
http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
In order to get a specific CVS snapshot, you need first to ask for
username/password : http://www.ks.uiuc.edu/Research/namd/cvsrequest.html
When your cvs access application is approved, you can use your username/password
to download a specific cvs snapshot :
cvs  -d :pserver:username@cvs.ks.uiuc.edu:/namd/cvsroot co -D "2013-02-06 23:59:00 GMT" namd2
In this case, the charm++ is not included.
You have to download separately and put it in the namd2 source tree :
http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz
Run instructions for NAMD.
ntell@iasa.gr
After build of NAMD you have an executable called namd2.
The best performance and scaling of namd is achieved using
hybrid MPI/MT version. On a system with nodes of NC cores per node
use 1 MPI task per node and NC threads per task,
for example on a 32 cores/node system use 1 MPI process,
set OMP_NUM_THREADS or any batch system related variable to 32.
Set a variable, for example MYPPN to NC-1,
for example to 31 for a 32 cores/node system.
You can also try other combinations of TASKSPERNODE/THREADSPERTASK to check.
The control file is stmv.8M.memopt.namd for tier-1 and stmv.28M.memopt.namd
for tier-0 systems.
The general way to run is :
WRAPPER WRAPPER_OPTIONS PATH_TO_namd2 +ppn $MYPPN stmv.8M.memopt.namd > logfile
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The run walltime is reported at the end of logfile : grep WallClock: logfile | awk -F ' ' '{print $2}'
NEMO_Build_README
Written in 2013-09-12, as a product of NEMO benchmark in PRACE 2IP-WP7.4.
Written by Soon-Heum "Jeff" Ko at Linkoping University, Sweden (sko@nsc.liu.se).
0. Before start
- Download two tarball files (src and input) from PRACE benchmark site.
- Create a directory, 'ORCA12_PRACE' and untar above-mentioned files under that directory. Then the directory structure would be as
----- ORCA12_PRACE
|
|------ DATA_CONFIG_ORCA12/
|
|------ FORCING/
|
|------ NEMOGCM/
|
|------ README
|
|------ Instruction_for_JuBE.tar.gz
1. Build-up of standalone version
- You can find the easy how-to from ORCA12_PRACE/README file, which is an instruction document written after PRACE 1IP contribution. To repeat the instruction,
1) cd NEMOGCM/ARCH
2) create a arch-COMPUTER.fcm file in NEMOGCM/ARCH corresponding to your needs. You can refer to 'arch-ifort_linux_curie.fcm' which is tuned for CURIE x86_64 system.
3) cd NEMOGCM/CONFIG
4) ./makenemo -n ORCA12.L75-PRACE -m COMPUTER
Then you will have a subdirectory 'ORCA12.L75-PRACE' is created.
2. Build-up under JuBE benchmark framework
- You shall first download JuBE benchmark suite and PRACE benchmark applications from PRACE SVN. Then you will find 'nemo' benchmark under PABS/applications. Because the old nemo benchmark set has been ill-written and there have been changes on NEMO source, we provide the benchmark setup for the current NEMO version in a separate tarball file (Instruction_for_JuBE.tar.gz). You can follow the instruction specified there for installing and running NEMO v3.4. in the JuBE benchmark suite.
\ No newline at end of file
NEMO_Run_README
Written in 2013-09-12, as a product of NEMO benchmark in PRACE 2IP-WP7.4.
Written by Soon-Heum "Jeff" Ko at Linkoping University, Sweden (sko@nsc.liu.se).
0. Before start
- Follow the instriction in 'NEMO_Build_README.txt' so that you have the directory structure as specified, along with compiled binary:
----- ORCA12_PRACE
|
|------ DATA_CONFIG_ORCA12/
|
|------ FORCING/
|
|------ NEMOGCM/
|
|------ README
|
|------ Instruction_for_JuBE.tar.gz
1. Running standalone version
- After compilation, you will have 'ORCA12.L75-PRACE' directory created under NEMOGCM/CONFIG.
1) cd ORCA12.L75-PRACE/EXP00
2) Link to datasets. Perform follows:
$ ln -s ../../../../DATA_CONFIG_ORCA12/* .
$ ln -s ../../../../FORCING/* .
3) Locate 'namelist' and 'namelist_ice' files in this directory and edit them
4) Run it. It does not have any special command line arguments, thus you can simply type 'mpirun opa'.
2. Running under JuBE benchmark framework
- You can prepare your own XML file to complete from compiling to running at the same time. A file 'ORCA_PRACE_CURIE.xml' under Instruction_for_JuBE.tar.gz could be used as an example. One remark for CURIE user: you shall specify your project ID and which type of queues (standard; large; ...) you are to use. That information is found from 'ccc_myproject' command.
\ No newline at end of file
Description and Building of the QCD Benchmark
=============================================
Description
===========
The QCD benchmark is, unlike the other benchmarks in the PRACE
application benchmark suite, not a full application but a set of 5
kernels which are representative of some of the most compute-intensive
parts of QCD calculations.
Test Cases
----------
Each of the 5 kernels has one test case to be used for Tier-0 and
Tier-1:
Kernel A is derived from BQCD (Berlin Quantum ChromoDynamics program),
a hybrid Monte-Carlo code that simulates Quantum Chromodynamics with
dynamical standard Wilson fermions. The computations take place on a
four-dimensional regular grid with periodic boundary conditions. The
kernel is a standard conjugate gradient solver with even/odd
pre-conditioning. Lattice size is 322 x 642.
Kernel B is derived from SU3_AHiggs, a lattice quantum chromodynamics
(QCD) code intended for computing the conditions of the Early