
################################################################################

                               QCD SPEED TESTS

################################################################################


This document is short guide to get started and run the speed tests. For
more detailed information see the README.extended.


PROGRAMS

The benchmark programs are provided in source form and must be
compiled by the user on the machine that is to be tested.

In addition the openQCD-1.4 package is needed. A tar-file of the
source code can be obtained from

http://luscher.web.cern.ch/luscher/openQCD/

and should be extracted in the same directory level as this package.

PROGRAM FEATURES

All programs parallelize in 0,1,2,3 or 4 dimensions, depending on what is
specified at compilation time. They are highly optimized for machines with
current Intel or AMD processors, but will run correctly on any system that
complies with the ISO C89 (formerly ANSI C) and the MPI 1.2 standards.

For the purpose of testing and code development, the programs can also
be run on a desktop or laptop computer. All what is needed for this is
a compliant C compiler and a local MPI installation such as Open MPI.


DOCUMENTATION

The simulation program has a modular form, with strict prototyping and a
minimal use of external variables. Each program file contains a small number
of externally accessible functions whose functionality is described at the top
of the file.

The data layout is explained in various README files and detailed instructions
are given on how to run the main programs. A set of further documentation
files are included in the doc directory, where the normalization conventions,
the chosen algorithms and other important program elements are described.


COMPILATION

The compilation of the programs requires an ISO C89 compliant compiler and a
compatible MPI installation that complies with the MPI standard 1.2 (or later).

In the main and devel directories, a GNU-style Makefile is included which
compiles and links the programs (just type "make" to compile everything; "make
clean" removes the files generated by "make"). The compiler options can be set
by editing the CFLAGS line in the Makefiles.

The Makefiles assume that the following environment variables are set:

  GCC             GNU C compiler command [Example: /usr/bin/gcc].

  MPI_HOME        MPI home directory [Example: /usr/lib64/mpi/gcc/openmpi].
                  The mpicc command used is the one in $MPI_HOME/mpicc and
                  the MPI libraries are expected in $MPI_HOME/lib.

  MPI_INCLUDE     Directory where the mpi.h file is to be found.

All programs are then compiled using the $MPI_HOME/bin/mpicc command. The
compiler options that can be set in the CFLAGS line depend on which C compiler
the mpicc command invokes (the GCC compiler command is only used to resolve
the dependencies on the include files).


SSE/AVX ACCELERATION

Current Intel and AMD processors are able to perform arithmetic operations on
short vectors of floating-point numbers in just one or two machine cycles,
using SSE and/or AVX instructions. The arithmetic performed by these
instructions fully complies with the IEEE 754 standard.

Many programs in the module directories include SSE and AVX inline-assembly
code. On 64bit systems, and if the GNU or Intel C compiler is used, the code
can be activated by setting the compiler flags -Dx64 and -DAVX, respectively.
In addition, SSE prefetch instructions will be used if one of the following
options is specified:

  -DP4     Assume that prefetch instructions fetch 128 bytes at a time
           (Pentium 4 and related Xeons).

  -DPM     Assume that prefetch instructions fetch 64 bytes at a time
           (Athlon, Opteron, Pentium M, Core, Core 2 and related Xeons).

  -DP3     Assume that prefetch instructions fetch 32 bytes at a time
           (Pentium III).

These options have an effect only if -Dx64 or -DAVX is set. The option
-DAVX implies -Dx64.

On recent x86-64 machines with AMD Opteron or Intel Xeon processors, for
example, the recommended compiler flags are

    -std=c89 -O -mno-avx -DAVX -DPM

For older machines that do not support the AVX instruction set, the
recommended flags are

    -std=c89 -O -mno-avx -Dx64 -DPM

More aggressive optimization levels such as -O2 and -O3 tend to have little
effect on the execution speed of the programs, but the risk of generating
wrong code is higher.

AVX instructions and the option -mno-avx may not be known to old versions
of the compilers, in which case one is limited to SSE accelerations with
option string -std=c89 -O -Dx64 -DPM.


DEBUGGING FLAGS

For troubleshooting and parameter tuning, it may helpful to switch on some
debugging flags at compilation time. The simulation program then prints a
detailed report to the log file on the progress made in specified subprogram.

The available flags are:

-DCGNE_DBG         CGNE solver.

-DFGCR_DBG         GCR solver.

-FGCR4VD_DBG       GCR solver for the little Dirac equation.

-DMSCG_DBG         MSCG solver.

-DDFL_MODES_DBG    Deflation subspace generation.

-DMDINT_DBG        Integration of the molecular-dynamics equations.

-DRWRAT_DBG        Computation of the rational function reweighting
                   factor.


RUNNING A SIMULATION

The simulation programs reside in the directory "main". For each program,
there is a README file in this directory which describes the program
functionality and its parameters.

Running a simulation for the first time requires its parameters to be chosen,
which tends to be a non-trivial task. The syntax of the input parameter files
and the meaning of the various parameters is described in some detail in
main/README.infiles and doc/parms.pdf. Examples of valid parameter files are
contained in the directory main/examples.


EXPORTED FIELD FORMAT

The field configurations generated in the course of a simulation are written
to disk in a machine-independent format (see modules/misc/archive.c).
Independently of the machine endianness, the fields are written in little
endian format. A byte-reordering is therefore not required when machines with
different endianness are used for the simulation and the physics analysis.


AUTHORS

The initial release of the openQCD package was written by Martin Lüscher and
Stefan Schaefer. Support for Schrödinger functional boundary conditions was
added by John Bulava. Several modules were taken over from the DD-HMC program
tree, which includes contributions from Luigi Del Debbio, Leonardo Giusti,
Björn Leder and Filippo Palombi.


ACKNOWLEDGEMENTS

In the course of the development of the openQCD code, many people suggested
corrections and improvements or tested preliminary versions of the programs.
The authors are particularly grateful to Isabel Campos, Dalibor Djukanovic,
Georg Engel, Leonardo Giusti, Björn Leder, Carlos Pena and Hubert Simma for
their communications and help.


LICENSE

The software may be used under the terms of the GNU General Public Licence
(GPL).


BUG REPORTS

If a bug is discovered, please send a report to <j.finkenrath@cyi.ac.cy>.


ALTERNATIVE PACKAGES AND COMPLEMENTARY PROGRAMS

There is a publicly available BG/Q version of openQCD that takes advantage of
the machine-specific features of IBM BlueGene/Q computers. The version is
available at <http://hpc.desy.de/simlab/codes/>.

The openQCD programs currently do not support reweighting in the quark
masses, but a module providing this functionality can be downloaded from
<http://www-ai.math.uni-wuppertal.de/~leder/mrw/>.

Previously generated gauge-field configurations are often used as initial
configuration for a new run. If the old and new lattices or boundary
conditions are not the same, the old configuration may however need to be
adapted, using a field conversion tool such as the one available at
<http://hpc.desy.de/simlab/codes/>, before the new run is started.
