pax_global_header 0000666 0000000 0000000 00000000064 13335510256 0014515 g ustar 00root root 0000000 0000000 52 comment=84fa196651981a1e30204e8c703bbc75c59792fc
ueabs-r1.0/ 0000775 0000000 0000000 00000000000 13335510256 0012636 5 ustar 00root root 0000000 0000000 ueabs-r1.0/README.md 0000664 0000000 0000000 00000065712 13335510256 0014130 0 ustar 00root root 0000000 0000000 # Unified European Applications Benchmark Suite, version 1.0
The Unified European Application Benchmark Suite (UEABS) is a set of 12 application codes taken from the pre-existing PRACE and DEISA application benchmark suites to form a single suite, with the objective of providing a set of scalable, currently relevant and publically available codes and datasets, of a size which can realistically be run on large systems, and maintained into the future. This work has been undertaken by Task 7.4 "Unified European Applications Benchmark Suite for Tier-0 and Tier-1" in the PRACE Second Implementation Phase (PRACE-2IP) project and will be updated and maintained by subsequent PRACE Implementation Phase projects.
For more details of the codes and datasets, and sample results, please see http://www.prace-ri.eu/IMG/pdf/d7.4_3ip.pdf
The codes composing the UEABS are:
- [ALYA](#alya)
- [Code_Saturne](#saturne)
- [CP2K](#cp2k)
- [GADGET](#gadget)
- [GENE](#gene)
- [GPAW](#gpaw)
- [GROMACS](#gromacs)
- [NAMD](#namd)
- [NEMO](#nemo)
- [QCD](#qcd)
- [Quantum Espresso](#espresso)
- [SPECFEM3D](#specfem3d)
# ALYA
The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP.
- Web site: https://www.bsc.es/computer-applications/alya-system
- Code download: http://www.prace-ri.eu/UEABS/ALYA/Alya_20131014.tar.gz
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/alya/ALYA_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/ALYA/ALYA_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/ALYA/ALYA_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/alya/ALYA_Run_README.txt
# Code_Saturne
Code_Saturne® is a multipurpose Computational Fluid Dynamics (CFD) software package, which has been developed by EDF (France) since 1997. The code was originally designed for industrial applications and research activities in several fields related to energy production; typical examples include nuclear power thermal-hydraulics, gas and coal combustion, turbo-machinery, heating, ventilation, and air conditioning. In 2007, EDF released the code as open-source and this provides both industry and academia to benefit from its extensive pedigree. Code_Saturne®’s open-source status allows for answers to specific needs that cannot easily be made available in commercial “black box” packages. It also makes it possible for industrial users and for their subcontractors to develop and maintain their own independent expertise and to fully control the software they use.
Code_Saturne® is based on a co-located finite volume approach that can handle three-dimensional meshes built with any type of cell (tetrahedral, hexahedral, prismatic, pyramidal, polyhedral) and with any type of grid structure (unstructured, block structured, hybrid). The code is able to simulate either incompressible or compressible flows, with or without heat transfer, and has a variety of models to account for turbulence. Dedicated modules are available for specific physics such as radiative heat transfer, combustion (e.g. with gas, coal and heavy fuel oil), magneto-hydro dynamics, and compressible flows, two-phase flows. The software comprises of around 350 000 lines of source code, with about 37% written in Fortran90, 50% in C and 15% in Python. The code is parallelised using MPI with some OpenMP.
- Web site: http://code-saturne.org
- Code download: http://code-saturne.org/cms/download or http://www.prace-ri.eu/UEABS/Code_Saturne/Code_Saturne_Source_3.0.1.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: http://code-saturne.org/cms/documentation/guides/installation
- Test Case A: http://www.prace-ri.eu/UEABS/Code_Saturne/Code_Saturne_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/code_saturne/Code_Saturne_README.txt
# CP2K
CP2K is a freely available (GPL) program to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials. It is very well and consistently written, standards-conforming Fortran 95, parallelized with MPI and in some parts with hybrid OpenMP+MPI as an option.
CP2K provides state-of-the-art methods for efficient and accurate atomistic simulations, sources are freely available and actively improved. It has an active international development team, with the unofficial head quarters in the University of Zürich.
- Web site: https://www.cp2k.org/
- Code download: https://www.cp2k.org/download
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/cp2k/CP2K_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/CP2K/CP2K_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/CP2K/CP2K_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/cp2k/CP2K_Run_README.txt
# GADGET
GADGET is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany. GADGET is written in C and uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs. GADGET computes gravitational forces with a hierarchical tree algorithm (optionally in combination with a particle-mesh scheme for long-range gravitational forces) and represents fluids by means of smoothed particle hydrodynamics (SPH). The code can be used for studies of isolated systems, or for simulations that include the cosmological expansion of space, either with, or without, periodic boundary conditions. In all these types of simulations, GADGET follows the evolution of a self-gravitating collisionless N-body system, and allows gas dynamics to be optionally included. Both the force computation and the time stepping of GADGET are fully adaptive, with a dynamic range that is, in principle, unlimited. GADGET can therefore be used to address a wide array of astrophysics interesting problems, ranging from colliding and merging galaxies, to the formation of large-scale structure in the Universe. With the inclusion of additional physical processes such as radiative cooling and heating, GADGET can also be used to study the dynamics of the gaseous intergalactic medium, or to address star formation and its regulation by feedback processes.
- Web site: http://www.mpa-garching.mpg.de/gadget/
- Code download: http://www.prace-ri.eu/UEABS/GADGET/gadget3_Source.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/gadget/gadget3_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/GADGET/gadget3_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/gadget/gadget3_Run_README.txt
# GENE
GENE is a gyro kinetic plasma turbulence code which has been developed since the late 1990’s and is physically very comprehensive and flexible as well as computationally very efficient and highly scalable. Originally used for flux-tube simulations, today GENE also operates as a global code, either gradient- or flux-driven. An arbitrary number of gyro kinetic particle species can be taken into account, including electromagnetic effects and collisions. GENE is, in principle, able to cover the widest possible range of scales, all the way from the system size (where nonlocal effects or avalanches can play a role) down to sub-ion-gyroradius scales (where ETG or micro tearing modes may contribute to the transport), depending on the available computer resources. Moreover, there exist interfaces to various MHD equilibrium codes. GENE has been carefully benchmarked against theoretical results and other codes.
The GENE code is written in Fortran 90 and C and is parallelized with pure MPI. It strongly relies on a Fast Fourier Transform library and has built-in support for FFTW, MKL or ESSL. It also uses LAPACK and ScaLapack routines for LU decomposition and solution of a linear system of equations of moderate size (up to 1000 unknowns).
- Web site: http://gene.rzg.mpg.de/
- Code download: http://www.prace-ri.eu/UEABS/GENE/GENE_Source.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: included with code download
- Test Case A: included with code download
- Test Case B: included with code download
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/gene/GENE_Run_README.txt
# GPAW
GPAW is an efficient program package for electronic structure calculations based on the density functional theory (DFT) and the time-dependent density functional theory (TD-DFT). The density-functional theory allows studies of ground state properties such as energetics and equilibrium geometries, while the time-dependent density functional theory can be used for calculating excited state properties such as optical spectra. The program package includes two complementary implementations of time-dependent density functional theory: a linear response formalism and a time-propagation in real time.
The program uses the projector augmented wave (PAW) method that allows one to get rid of the core electrons and work with soft pseudo valence wave functions. The PAW method can be applied on the same footing to all elements, for example, it provides a reliable description of the transition metal elements and the first row elements with open p-shells that are often problematic for standard pseudopotentials. A further advantage of the PAW method is that it is an all-electron method (frozen core approximation) and there is a one to one transformation between the pseudo and all-electron quantities.
The equations of the (time-dependent) density functional theory within the PAW method are discretized using finite-differences and uniform real-space grids. The real-space representation allows flexible boundary conditions, as the system can be finite or periodic in one, two or three dimensions (e.g. cluster, slab, bulk). The accuracy of the discretization is controlled basically by single parameter, the grid spacing. The real-space representation allows also efficient parallelization with domain decomposition.
The program offers several parallelization levels. The most basic parallelization strategy is domain decomposition over the real-space grid. In magnetic systems it is possible to parallelize over spin, and in systems that have k-points (surfaces or bulk systems) parallelization over k-points is also possible. Furthermore, parallelization over electronic states is possible in DFT and in real-time TD-DFT calculations. GPAW is written in Python and C and parallelized with MPI.
- Web site: https://wiki.fysik.dtu.dk/gpaw/
- Code download: https://gitlab.com/gpaw/gpaw
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/gpaw/GPAW_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/GPAW/GPAW_benchmark.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/gpaw/GPAW_Run_README.txt
# GROMACS
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
GROMACS supports all the usual algorithms you expect from a modern molecular dynamics implementation, (check the online reference or manual for details), but there are also quite a few features that make it stand out from the competition:
- GROMACS provides extremely high performance compared to all other programs. A lot of algorithmic optimizations have been introduced in the code; we have for instance extracted the calculation of the virial from the innermost loops over pairwise interactions, and we use our own software routines to calculate the inverse square root. In GROMACS 4.6 and up, on almost all common computing platforms, the innermost loops are written in C using intrinsic functions that the compiler transforms to SIMD machine instructions, to utilize the available instruction-level parallelism. These kernels are available in either single and double precision, and in support all the different kinds of SIMD support found in x86-family (and other) processors.
- Also since GROMACS 4.6, we have excellent CUDA-based GPU acceleration on GPUs that have Nvidia compute capability >= 2.0 (e.g. Fermi or later)
- GROMACS is user-friendly, with topologies and parameter files written in clear text format. There is a lot of consistency checking, and clear error messages are issued when something is wrong. Since a C preprocessor is used, you can have conditional parts in your topologies and include other files. You can even compress most files and GROMACS will automatically pipe them through gzip upon reading.
- There is no scripting language – all programs use a simple interface with command line options for input and output files. You can always get help on the options by using the -h option, or use the extensive manuals provided free of charge in electronic or paper format.
- As the simulation is proceeding, GROMACS will continuously tell you how far it has come, and what time and date it expects to be finished.
- Both run input files and trajectories are independent of hardware endian-ness, and can thus be read by any version GROMACS, even if it was compiled using a different floating-point precision.
- GROMACS can write coordinates using lossy compression, which provides a very compact way of storing trajectory data. The accuracy can be selected by the user.
- GROMACS comes with a large selection of flexible tools for trajectory analysis – you won’t have to write any code to perform routine analyses. The output is further provided in the form of finished Xmgr/Grace graphs, with axis labels, legends, etc. already in place!
- A basic trajectory viewer that only requires standard X libraries is included, and several external visualization tools can read the GROMACS file formats.
- GROMACS can be run in parallel, using either the standard MPI communication protocol, or via our own “Thread MPI” library for single-node workstations.
- GROMACS contains several state-of-the-art algorithms that make it possible to extend the time steps is simulations significantly, and thereby further enhance performance without sacrificing accuracy or detail.
- The package includes a fully automated topology builder for proteins, even multimeric structures. Building blocks are available for the 20 standard aminoacid residues as well as some modified ones, the 4 nucleotide and 4 deoxinucleotide resides, several sugars and lipids, and some special groups like hemes and several small molecules.
- There is ongoing development to extend GROMACS with interfaces both to Quantum Chemistry and Bioinformatics/databases.
- GROMACS is Free Software, available under the GNU Lesser General Public License (LGPL), version 2.1. You can redistribute it and/or modify it under the terms of the LGPL as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
Instructions:
- Web site: http://www.gromacs.org/
- Code download: http://www.gromacs.org/Downloads The UEABS benchmark cases require the use of 5.1.x or newer branch: the latest 2016 version is suggested.
- Test Case A: http://www.prace-ri.eu/UEABS/GROMACS/GROMACS_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/GROMACS/GROMACS_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/gromacs/GROMACS_Run_README.txt
# NAMD
NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of compute platforms. NAMD is developed by the “Theoretical and Computational Biophysics Group” at the University of Illinois at Urbana Champaign. In the design of NAMD particular emphasis has been placed on scalability when utilizing a large number of processors. The application can read a wide variety of different file formats, for example force fields, protein structure, which are commonly used in bio-molecular science.
A NAMD license can be applied for on the developer’s website free of charge. Once the license has been obtained, binaries for a number of platforms and the source can be downloaded from the website.
Deployment areas of NAMD include pharmaceutical research by academic and industrial users. NAMD is particularly suitable when the interaction between a number of proteins or between proteins and other chemical substances is of interest. Typical examples are vaccine research and transport processes through cell membrane proteins.
NAMD is written in C++ and parallelised using Charm++ parallel objects, which are implemented on top of MPI.
- Web site: http://www.ks.uiuc.edu/Research/namd/
- Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/namd/NAMD_Download_README.txt
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/namd/NAMD_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/NAMD/NAMD_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/NAMD/NAMD_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/namd/NAMD_Run_README.txt
# NEMO
NEMO (Nucleus for European Modeling of the Ocean) is a state-of-the-art modeling framework for oceanographic research, operational oceanography seasonal forecast and climate studies. Prognostic variables are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity. In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid. Within NEMO, the ocean is interfaced with a sea-ice model (LIM v2 and v3), passive tracer and biogeochemical models (TOP) and, via the OASIS coupler, with several atmospheric general circulation models. It also supports two-way grid embedding via the AGRIF software.
The framework includes five major components:
- the blue ocean (ocean dynamics, NEMO-OPA)
- the white ocean (sea-ice, NEMO-LIM)
- the green ocean (biogeochemistry, NEMO-TOP)
- the adaptive mesh refinement software (AGRIF)
- the assimilation component (NEMO_TAM)
NEMO is used by a large community: 240 projects in 27 countries (14 in Europe, 13 elsewhere) and 350 registered users (numbers for year 2008). The code is available under the CeCILL license (public license). The latest stable version is 3.6. NEMO is written in Fortran90 and parallelized with MPI.
- Web site: http://www.nemo-ocean.eu/
- Code download: http://www.prace-ri.eu/UEABS/NEMO/NEMO_Source.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/nemo/NEMO_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/NEMO/NEMO_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/nemo/NEMO_Run_README.txt
# QCD
The QCD benchmark is, unlike the other benchmarks in the PRACE application benchmark suite, not a full application but a set of 5 kernels which are representative of some of the most compute-intensive parts of QCD calculations.
Each of the 5 kernels has one test case:
Kernel A is derived from BQCD (Berlin Quantum ChromoDynamics program), a hybrid Monte-Carlo code that simulates Quantum Chromodynamics with dynamical standard Wilson fermions. The computations take place on a four-dimensional regular grid with periodic boundary conditions. The kernel is a standard conjugate gradient solver with even/odd pre-conditioning. Lattice size is 322 x 642.
Kernel B is derived from SU3_AHiggs, a lattice quantum chromodynamics (QCD) code intended for computing the conditions of the Early Universe. Instead of “full QCD”, the code applies an effective field theory, which is valid at high temperatures. In the effective theory, the lattice is 3D. Lattice size is 2563.
Kernel C Lattice size is 84. Note that Kernel C can only be run in a weak scaling mode, where each CPU stores the same local lattice size, regardless of the number of CPUs. Ideal scaling for this kernel therefore corresponds to constant execution time, and performance is simply the reciprocal of the execution time.
Kernel D consists of the core matrix-vector multiplication routine for standard Wilson fermions. The lattice size is 644 .
Kernel E consists of a full conjugate gradient solution using Wilson fermions. Lattice size is 643 x 3.
- Code download: http://www.prace-ri.eu/UEABS/QCD/QCD_Source_TestCaseA.tar.gz
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/qcd/QCD_Build_README.txt
- Test Case A: included with source download
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/qcd/QCD_Run_README.txt
# Quantum Espresso
QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). QUANTUM ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and inter-operable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.
QUANTUM ESPRESSO is written mostly in Fortran90, and parallelised using MPI and OpenMP.
- Web site: http://www.quantum-espresso.org/
- Code download: http://www.quantum-espresso.org/download/
- Build instructions: http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/
- Test Case A: http://www.prace-ri.eu/UEABS/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/Quantum_Espresso/QuantumEspresso_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/quantum_espresso/QuantumEspresso_Run_README.txt
# SPECFEM3D
The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). All SPECFEM3D_GLOBE software is written in Fortran90 with full portability in mind, and conforms strictly to the Fortran95 standard. It uses no obsolete or obsolescent features of Fortran77. The package uses parallel programming based upon the Message Passing Interface (MPI).
The SEM was originally developed in computational fluid dynamics and has been successfully adapted to address problems in seismic wave propagation. It is a continuous Galerkin technique, which can easily be made discontinuous; it is then close to a particular case of the discontinuous Galerkin technique, with optimized efficiency because of its tensorized basis functions. In particular, it can accurately handle very distorted mesh elements. It has very good accuracy and convergence properties. The spectral element approach admits spectral rates of convergence and allows exploiting hp-convergence schemes. It is also very well suited to parallel implementation on very large supercomputers as well as on clusters of GPU accelerating graphics cards. Tensor products inside each element can be optimized to reach very high efficiency, and mesh point and element numbering can be optimized to reduce processor cache misses and improve cache reuse. The SEM can also handle triangular (in 2D) or tetrahedral (3D) elements as well as mixed meshes, although with increased cost and reduced accuracy in these elements, as in the discontinuous Galerkin method.
In many geological models in the context of seismic wave propagation studies (except for instance for fault dynamic rupture studies, in which very high frequencies of supershear rupture need to be modeled near the fault, a continuous formulation is sufficient because material property contrasts are not drastic and thus conforming mesh doubling bricks can efficiently handle mesh size variations. This is particularly true at the scale of the full Earth. Effects due to lateral variations in compressional-wave speed, shear-wave speed, density, a 3D crustal model, ellipticity, topography and bathyletry, the oceans, rotation, and self-gravitation are included. The package can accommodate full 21-parameter anisotropy as well as lateral variations in attenuation. Adjoint capabilities and finite-frequency kernel simulations are also included.
- Web site: http://geodynamics.org/cig/software/specfem3d_globe/
- Code download: http://geodynamics.org/cig/software/specfem3d_globe/
- Build instructions: http://www.geodynamics.org/wsvn/cig/seismo/3D/SPECFEM3D_GLOBE/trunk/doc/USER_MANUAL/manual_SPECFEM3D_GLOBE.pdf?op=file&rev=0&sc=0
- Test Case A: http://www.prace-ri.eu/UEABS/SPECFEM3D/SPECFEM3D_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/SPECFEM3D/SPECFEM3D_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.0/specfem3d/SPECFEM3D_Run_README.txt
ueabs-r1.0/alya/ 0000775 0000000 0000000 00000000000 13335510256 0013564 5 ustar 00root root 0000000 0000000 ueabs-r1.0/alya/ALYA_Build_README.txt 0000664 0000000 0000000 00000000540 13335510256 0017206 0 ustar 00root root 0000000 0000000 In order to build ALYA (Alya.x), please follow these steps:
- Go to: Thirdparties/metis-4.0 and build the Metis library (libmetis.a) using 'make'
- Go to the directory: Executables/unix
- Adapt the file: configure-marenostrum-mpi.txt to your own MPI wrappers and paths
- Execute:
./configure -x -f=configure-marenostrum-mpi.txt nastin parall
make
ueabs-r1.0/alya/ALYA_Run_README.txt 0000664 0000000 0000000 00000000610 13335510256 0016711 0 ustar 00root root 0000000 0000000 In order to run ALYA, you need at least the following input files per execution:
X.dom.dat
X.typ.dat
X.geo.dat
X.bcs.dat
X.inflow_profile.bcs
X.ker.dat
X.nsi.dat
X.dat
In our case, there are 2 different inputs, so X={1_1p1mill,3_27p3mill}
To execute a simulation, you must be inside the input directory and you should submit a job like:
mpirun Alya.x 1_1p1mill
or
mpirun Alya.x 3_27p3mill
ueabs-r1.0/code_saturne/ 0000775 0000000 0000000 00000000000 13335510256 0015311 5 ustar 00root root 0000000 0000000 ueabs-r1.0/code_saturne/Code_Saturne_README.txt 0000664 0000000 0000000 00000001452 13335510256 0021444 0 ustar 00root root 0000000 0000000 Installation:
-------------
Code_Saturne is open source and the documentation about how to install
it is to be found under http://www.code-saturne.org
However, the version 3.0.1 has been copied to the current folder.
Running - Test case:
----------------
Running a case is described in the following page: http://www.code-saturne.org
The test case deals with the flow in a bundle of tubes.
A larger mesh (51M cells) is built from an original mesh of 13M cells.
The original mesh_input file (already preprocessed for Code_Saturne)
is to be found under MESH.
The user subroutines are under XE6_INTERLAGOS/SRC
The test case has been set up to run for 10 time-steps.
Contact:
--------
If you have any question, please contact Charles Moulinec (STFC Daresbury Laboratory)
at charles.moulinec@stfc.ac.uk
ueabs-r1.0/cp2k/ 0000775 0000000 0000000 00000000000 13335510256 0013475 5 ustar 00root root 0000000 0000000 ueabs-r1.0/cp2k/CP2K_Build_README.txt 0000664 0000000 0000000 00000005667 13335510256 0017107 0 ustar 00root root 0000000 0000000 Build instructions for CP2K.
2014-04-09 : ntell@iasa.gr
CP2K needs a number of external libraries and a threads enabled MPI implementation.
These are : BLAS/LAPACK, BLACS/SCALAPACK, LIBINT, FFTW3.
It is advised to use the vendor optimized versions of these libraries.
If some of these are not available on your machine,
there some implementations of these libraries. Some of these are below.
1. BLAS/LAPACK :
netlib BLAS/LAPACK : http://netlib.org/lapack/
ATLAS : http://math-atlas.sf.net/
GotoBLAS : http://www.tacc.utexas.edu/tacc-projects
MKL : refer to your Intel MKL installation, if available
ACML : refer to your ACML installation if available
2. BLACS/SCALAPACK : http://netlib.org/scalapack/
Intel BLACS/SCALAPACK Implementation
3. LIBINT : http://sourceforge.net/projects/libint/files/v1-releases/
4. FFTW3 : http://www.fftw.org/
In the directory cp2k-VERSION/arch there are some ARCH files with instructions how
to build CP2K. For each architecture/compiler there are few arch files describing how to build cp2k.
Select one of the .psmp files that fits your architecture/compiler.
cd to cp2k-VERSION/makefiles
If the arch file for your machine is called SOMEARCH_SOMECOMPILER.psmp,
issue : make ARCH=SOMEARCH_SOMECOMPILER VERSION=psmp
If everything goes fine, you'll find the executable cp2k.psmp in the directory
cp2k-VERSION/exe/SOMEARCH_SOMECOMPILER
In most cases you need to create a custom arch file that fits cpu type,
compiler, and the installation path of external libraries.
As an example below is the arch file for a machine with mpif90/gcc/gfortran, that supports SSE2, has
all the external libraries installed under /usr/local/, uses ATLAS with full
LAPACK support for BLAS/LAPACK, Scalapack-2 for BLACS/Scalapack, fftw3 FFTW3 and libint-1.1.4:
#=======================================================================================================
CC = gcc
CPP =
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DFLAGS = -D__GFORTRAN -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__FFTW3 -D__LIBINT -I/usr/local/fftw3/include -I/usr/local/libint-1.1.4/include
CPPFLAGS =
FCFLAGS = $(DFLAGS) -O3 -msse2 -funroll-loops -finline -ffree-form
FCFLAGS2 = $(DFLAGS) -O3 -msse2 -funroll-loops -finline -ffree-form
LDFLAGS = $(FCFLAGS)
LIBS = /usr/local/Scalapack/lib/libscalapack.a \
/usr/local/Atlas/lib/liblapack.a \
/usr/local/Atlas/lib/libf77blas.a \
/usr/local/Atlas/lib/libcblas.a \
/usr/local/Atlas/lib/libatlas.a \
/usr/local/fftw3/lib/libfftw3_threads.a \
/usr/local/fftw3/lib/libfftw3.a \
/usr/local/libint-1.1.4/lib/libderiv.a \
/usr/local/libint-1.1.4/lib/libint.a \
-lstdc++ -lpthread
OBJECTS_ARCHITECTURE = machine_gfortran.o
#=======================================================================================================
ueabs-r1.0/cp2k/CP2K_Download_README.txt 0000664 0000000 0000000 00000001047 13335510256 0017603 0 ustar 00root root 0000000 0000000 CP2K can be downloaded from : http://www.cp2k.org/download
It is free for all users under GPL license,
see Obtaining CP2K section in the download page.
In UEABS(2IP) the 2.3 branch was used that can be downloaded from :
http://sourceforge.net/projects/cp2k/files/cp2k-2.3.tar.bz2
Data files are compatible with at least 2.4 branch.
Tier-0 data set requires the libint-1.1.4 library. If libint version 1
is not available on your machine, libint can be downloaded from :
http://sourceforge.net/projects/libint/files/v1-releases/libint-1.1.4.tar.gz
ueabs-r1.0/cp2k/CP2K_Run_README.txt 0000664 0000000 0000000 00000002761 13335510256 0016604 0 ustar 00root root 0000000 0000000 Run instructions for CP2K.
2013-08-13 : ntell@iasa.gr
After build of hybrid MPI/OMP CP2K you have an executable called cp2k.psmp.
You can try any combination of TASKSPERNODE/THREADSPERTASK.
The input file is H2O-1024.inp for tier-1 and input_bulk_HFX_3.inp for tier-0 systems.
For tier-1 systems the best performance is usually obtained with pure MPI,
while for tier-0 systems the best performance is obtained using 1 MPI task per
node with the number of threads/MPI_Task being equal to the number of
cores/node.
Tier-0 case requires a converged wavefunction file, that can be obtained
running with any number of cores, 1024-2048 cores are suggested :
WRAPPER WRAPPER_OPTIONS PATH_TO_cp2k.psmp -i input_bulk_B88_3.inp -o input_bulk_B88_3.log
When this run finish, mv the saved restart file LiH_bulk_3-RESTART.wfn to
B88.wfn
The general way to run is :
WRAPPER WRAPPER_OPTIONS PATH_TO_cp2k.psmp -i inputfile -o logile
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The run walltime is reported near the end of logfile : grep ^\ CP2K\^ logile | tail -1 | awk -F ' ' '{print $7}'
ueabs-r1.0/gadget/ 0000775 0000000 0000000 00000000000 13335510256 0014071 5 ustar 00root root 0000000 0000000 ueabs-r1.0/gadget/gadget3_Build_README.txt 0000664 0000000 0000000 00000000520 13335510256 0020301 0 ustar 00root root 0000000 0000000 1. Install FFTW-2, available at http://www.fftw.org
2. Install GSL, availavle at http://www.gnu.org/software/gsl
3. Install HDF5, availavle at http://www.hdfgroup.org/HDF5/
4. Go to Gadget3/
5. Edit Makefile, set:
CC
CXX
GSL_INCL
GSL_LIBS
FFTW_INCL
FFTW_LIBS
HDF5INCL
HDF5LIB
6. make CONFIG=Config-Medium.sh
ueabs-r1.0/gadget/gadget3_Run_README.txt 0000664 0000000 0000000 00000000345 13335510256 0020013 0 ustar 00root root 0000000 0000000 1. Creation of input
mpirun -np 128 ./N-GenIC ics_medium.param
ics_medium.param is in N-GenIC directory
2. Run calculation
mpirun -np 128 ./Gadget3 param-medium.txt
param-medium.txt is in Gadget3 directory
ueabs-r1.0/gene/ 0000775 0000000 0000000 00000000000 13335510256 0013554 5 ustar 00root root 0000000 0000000 ueabs-r1.0/gene/GENE_Run_README.txt 0000664 0000000 0000000 00000005572 13335510256 0016705 0 ustar 00root root 0000000 0000000 This is the README file for the GENE application benchmark,
distributed with the Unified European Application Benchmark Suite.
-----------
GENE readme
-----------
Contents
--------
1. General description
2. Code structure
3. Parallelization
4. Building
5. Execution
6. Data
1. General description
======================
The gyrokinetic plasma turbulence code GENE (this acronym stands for
Gyrokinetic Electromagnetic Numerical Experiment) is a software package
dedicated to solving the nonlinear gyrokinetic Integro-Differential system
of equations in either flux-tube domain or in a radially nonlocal domain.
GENE has been developed by a team of people (the Gene Development Team,
led by F. Jenko, Max-Planck-Institut for Plasma Physics) over the last
several years.
For further documentation of the code see: http://www.ipp.mpg.de/~fsj/gene/
2. Code structure
==================
Each particle species is described by a time-dependent distribution function
in a five-dimensional phase space.
This results in 6 dimensional arrays, which have the following coordinates:
x y z three space coordinates
v parallel velocity
w perpendicular velocity
spec species of particles
GENE is written completely in FORTRAN90, with some language structures
from Fortran 2003 standard. It also contains preprocessing directives.
3. Parallelization
==================
Parallelization is done by domain decomposition of all 6 coordinates using MPI.
x, y, z 3 space coordinates
v parallel velocity
w perpendicular velocity
spec species of particles
4. Building
===========
The source code (fortran-90) resides in directory src.
The compilation of GENE will be done by JuBE.
Compilation will be done automatically if a new executable for the
benchmark runs is needed.
5. Running the code
====================
A very brief description of the datasets:
parameters_small
A small data set for test purposes. Needs only 8 cores to run.
parameters_tier1
Global simulation of ion-scale turbulence in Asdex-Upgrade,
needs 200-500GB total memory, runs from 256 to 4096 cores
parameters_tier0
Global simulation of ion-scale turbulence in JET,
needs 3.5-7TB total memory, runs from 4096 to 16384 cores
For running the benchmark for GENE, please follow the instructions for
using JuBE.
JuBE generates for each benchmark run a run directory and generates from
a template input file the input file 'parameters' and stores it in the
run directory.
A job submit script is created as well and is submitted.
6. Data
=======
The only input file is 'parameters'. It has the format of a f90 namelist.
The following output files are stored in the run directory.
nrg.dat The content of this file is used to verify the correctness
of the benchmark run.
stdout is redirected by JuBE.
It contains logging information,
especially the result of the time measurement.
--------------------------------------------------------------------------
ueabs-r1.0/gpaw/ 0000775 0000000 0000000 00000000000 13335510256 0013574 5 ustar 00root root 0000000 0000000 ueabs-r1.0/gpaw/GPAW_Build_README.txt 0000664 0000000 0000000 00000002174 13335510256 0017233 0 ustar 00root root 0000000 0000000 Instructions for obtaining GPAW and its test set for PRACE benchmarking
GPAW is licensed under GPL, so there are no license issues
Software requirements
=====================
* MPI
* BLAS, LAPACK, Scalapack
* HDF5
* Python (2.x series from 2.4 upwards)
* For very large calculations ( > 4000 CPU cores) it is recommended to
to use the special Python interpreter which reduces the
initialization time related to Python's import mechanism:
https://gitorious.org/scalable-python
* NumPy ( > 1.3)
Obtaining the source code
=========================
* This benchmark uses the 3.6.1.3356 version of Atomic Simulation Environment
(ASE), which can be obtained as follows:
svn co -r 3356 https://svn.fysik.dtu.dk/projects/ase/trunk ase
* This benchmark uses the 0.9.10710 version of GPAW, which can be
obtained as follows:
svn co -r 10710 https://svn.fysik.dtu.dk/projects/gpaw/trunk gpaw
* Installation instructions for various architectures are given in
https://wiki.fysik.dtu.dk/gpaw/install/platforms_and_architectures.html
Support
=======
* Help regarding the benchmark can be requested from jussi.enkovaara@csc.fi
ueabs-r1.0/gpaw/GPAW_Run_README.txt 0000664 0000000 0000000 00000005236 13335510256 0016742 0 ustar 00root root 0000000 0000000 This benchmark set contains a short functional test as well as scaling
tests for electronic structure simulation software GPAW. More information on
GPAW can be found at wiki.fysik.dtu.dk/gpaw
Functional test: functional.py
==============================
A calculation for the ground state electronic structure of small Si cluster
followed by linear response time-dependent density-functional theory
calculation. This test works with 8-64 CPU cores.
Medium scaling test: Si_gs.py
=============================
A ground state calculation (few iterations) for spherical Si
This test should scale to ~2000 processor cores in x86 architecture.
Total running time with ~2000 cores is ~7 min. In principle, arbitrary
number of CPU cores can be used, but recommended values are powers of 2.
This test produces a 47 GB output file Si_gs.hdf5 to be used for the
large scaling test Si_lr1.py.
For scalability testing the relevant timer in the text output
'out_Si_gs_pXXXX.txt' (where XXXX is the CPU core count) is 'SCF-cycle'.
The parallel I/O performance (with HDF5) can be benchmarked with the
'IO' timer.
Large scaling test: Si_lr1.py
=============================
Linear response TDDFT calculation for spherical Si cluster
This benchmark should be scalable to ~20 000 CPU cores
in x86 architecture. Number of CPU cores has to be a multiple of 256.
Estimated total running time with 20 000 CPU cores is 5 min.
The benchmark requires as input the ground state wave functions in
the file Si_gs.hdf5 which can be produced by the ground state benchmark
Si_gs.py
The relevant timer for this benchmark is 'Calculate K matrix',
timing information is written to a text output file Si_lr1_pxxxx.txt where
xxxx is the number of CPU cores.
Optional large scaling test: Au38_lr.py
=======================================
Linear response TDDFT calculation for Au38 cluster surrounded by CH3
ligands. This benchmark should be scalable to ~20 000 CPU cores
in x86 architecture. Number of CPU cores has to be a multiple of 256.
Estimated running with 20 000 CPU cores is 5 min.
The benchmark requires as input the ground state wave functions
which can be produced by input Au38_gs.py (about 5 min calculation with
64 cores).
The relevant timer for this benchmark is 'Calculate K matrix',
timing information is written to a text output file Au38_lr_pxxxx.txt where
xxxx is the number of CPU cores.
How to run
==========
* Download and build the source code along the instructions in GPAW_Build_README.txt
* Benchmarks do not need any special command line options and can be run
just as e.g. :
mpirun -np 64 gpaw-python functional.py
mpirun -np 1024 gpaw-python Si_gs.py
mpirun -np 16384 gpaw-python Si_lr1.py
ueabs-r1.0/gromacs/ 0000775 0000000 0000000 00000000000 13335510256 0014271 5 ustar 00root root 0000000 0000000 ueabs-r1.0/gromacs/GROMACS_Download_README.txt 0000664 0000000 0000000 00000000254 13335510256 0020772 0 ustar 00root root 0000000 0000000 Gromacs can be downloaded from : http://www.gromacs.org/Downloads
The UEABS benchmark cases require the use of 4.6 or newer branch,
the latest 4.6.x version is suggested.
ueabs-r1.0/gromacs/GROMACS_Run_README.txt 0000664 0000000 0000000 00000004367 13335510256 0020000 0 ustar 00root root 0000000 0000000 There are two data sets in UEABS for Gromacs.
1. ion_channel that use PME for electrostatics, for Tier-1 systems
2. lignocellulose-rf that use Reaction field for electrostatics, for Tier-0 systems. Reference : http://pubs.acs.org/doi/abs/10.1021/bm400442n
The input data file for each benchmark is the corresponding .tpr file produced using
tools from a complete gromacs installation and a series of ascii data files
(atom coords/velocities, forcefield, run control).
If it happens to run the tier-0 case on BG/Q use lignucellulose-rf.BGQ.tpr
instead lignocellulose-rf.tpr. It is the same as lignocellulose-rf.tpr
created on a BG/Q system.
The general way to run gromacs benchmarks is :
WRAPPER WRAPPER_OPTIONS PATH_TO_MDRUN -s CASENAME.tpr -maxh 0.50 -resethway -noconfout -nsteps 10000 -g logile
CASENAME is one of ion_channel or lignocellulose-rf
maxh : Terminate after 0.99 times this time (hours) i.e. gracefully terminate after ~30 min
resethwat : Reset Timer counters at half steps. This means that the reported
walltime and performance referes to the last
half steps of sumulation.
noconfout : Do not save output coordinates/velocities at the end.
nsteps : Run this number of steps, no matter what is requested in the input file
logfile : The output filename. If extension .log is ommited
it is automatically appended. Obviously, it should be different
for different runs.
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The best performance is usually obtained using pure MPI i.e. THREADSPERTASK=1.
You can check other hybrid MPI/OMP combinations.
The execution time is reported at the end of logfile : grep Time: logfile | awk -F ' ' '{print $3}'
NOTE : This is the wall time for the last half number of steps.
For sufficiently large nsteps, this is half of the total wall time.
ueabs-r1.0/namd/ 0000775 0000000 0000000 00000000000 13335510256 0013555 5 ustar 00root root 0000000 0000000 ueabs-r1.0/namd/NAMD_Build_README.txt 0000664 0000000 0000000 00000005233 13335510256 0017174 0 ustar 00root root 0000000 0000000 Build instructions for namd.
In benchmarks the memopt version is used with SMP support.
In order to build this version, your MPI need to have level of thread support: MPI_THREAD_FUNNELED
You need a NAMD CVS 2.9 version 2013-02-06 or later.
1. Uncompress/tar the source.
2. cd NAMD_Source_BASE (the directory name depends on how the source obtained,
typically : namd2 or NAMD_CVS_2013-02-06_Source )
3. untar the charm-VERSION.tar that exists. If you obtained the namd source via
cvs, you need to download separately charm.
4. cd to charm-VERSION directory
5. configure and compile charm :
This step is system dependent. Some examples are :
CRAY XE6 : ./build charm++ mpi-crayxe smp --with-production -O -DCMK_OPTIMIZE
CURIE : ./build charm++ mpi-linux-x86_64 smp mpicxx ifort --with-production -O -DCMK_OPTIMIZE
JUQUEEN : ./build charm++ mpi-bluegeneq smp xlc --with-production -O -DCMK_OPTIMIZE
The syntax is : ./build charm++ ARCHITECTURE smp (compilers, optional) --with-production -O -DCMK_OPTIMIZE
You can find a list of supported architectures/compilers in charm-VERSION/src/arch
The smp option is mandatory to build the Hybrid version of namd.
This builds charm++.
6. cd ..
7. Configure NAMD.
This step is system dependent. Some examples are :
CRAY-XE6 ./config CRAY-XT-g++ --charm-base ./charm-6.5.0 --charm-arch mpi-crayxe-smp --with-fftw3 --fftw-prefix $CRAY_FFTW_DIR --without-tcl --with-memopt --charm-opts -verbose
CURIE ./config Linux-x86_64-icc --charm-base ./charm-6.5.0 --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec "
Juqueen: ./config BlueGeneQ-MPI-xlC --charm-base ./charm-6.5.0 --charm-arch mpi-bluegeneq-smp-xlc --with-fftw3 --with-fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --charm-opts -verbose --with-memopt
You need to specify the fftw3 installation directory. On systems that
use environment modules you need to load the existing fftw3 module
and probably use the provided environment variables - like in CRAY-XE6
example above.
If fftw3 libraries are not installed on your system,
download and install fftw-3.3.3.tar.gz from http://www.fftw.org/.
You may adjust the compilers and compiler flags as the CURIE example.
When config ends prompts to change to a directory and run make.
8. cd to the reported directory and run make
If everything is ok you'll find the executable with name namd2 in this
directory.
ueabs-r1.0/namd/NAMD_Download_README.txt 0000664 0000000 0000000 00000001354 13335510256 0017704 0 ustar 00root root 0000000 0000000 The official site to download namd is :
http://www.ks.uiuc.edu/Research/namd/
You need to register for free here to get a namd copy from here :
http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
In order to get a specific CVS snapshot, you need first to ask for
username/password : http://www.ks.uiuc.edu/Research/namd/cvsrequest.html
When your cvs access application is approved, you can use your username/password
to download a specific cvs snapshot :
cvs -d :pserver:username@cvs.ks.uiuc.edu:/namd/cvsroot co -D "2013-02-06 23:59:00 GMT" namd2
In this case, the charm++ is not included.
You have to download separately and put it in the namd2 source tree :
http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz
ueabs-r1.0/namd/NAMD_Run_README.txt 0000664 0000000 0000000 00000002517 13335510256 0016703 0 ustar 00root root 0000000 0000000 Run instructions for NAMD.
ntell@iasa.gr
After build of NAMD you have an executable called namd2.
The best performance and scaling of namd is achieved using
hybrid MPI/MT version. On a system with nodes of NC cores per node
use 1 MPI task per node and NC threads per task,
for example on a 32 cores/node system use 1 MPI process,
set OMP_NUM_THREADS or any batch system related variable to 32.
Set a variable, for example MYPPN to NC-1,
for example to 31 for a 32 cores/node system.
You can also try other combinations of TASKSPERNODE/THREADSPERTASK to check.
The control file is stmv.8M.memopt.namd for tier-1 and stmv.28M.memopt.namd
for tier-0 systems.
The general way to run is :
WRAPPER WRAPPER_OPTIONS PATH_TO_namd2 +ppn $MYPPN stmv.8M.memopt.namd > logfile
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The run walltime is reported at the end of logfile : grep WallClock: logfile | awk -F ' ' '{print $2}'
ueabs-r1.0/nemo/ 0000775 0000000 0000000 00000000000 13335510256 0013574 5 ustar 00root root 0000000 0000000 ueabs-r1.0/nemo/NEMO_Build_README.txt 0000664 0000000 0000000 00000003377 13335510256 0017241 0 ustar 00root root 0000000 0000000 NEMO_Build_README
Written in 2013-09-12, as a product of NEMO benchmark in PRACE 2IP-WP7.4.
Written by Soon-Heum "Jeff" Ko at Linkoping University, Sweden (sko@nsc.liu.se).
0. Before start
- Download two tarball files (src and input) from PRACE benchmark site.
- Create a directory, 'ORCA12_PRACE' and untar above-mentioned files under that directory. Then the directory structure would be as
----- ORCA12_PRACE
|
|------ DATA_CONFIG_ORCA12/
|
|------ FORCING/
|
|------ NEMOGCM/
|
|------ README
|
|------ Instruction_for_JuBE.tar.gz
1. Build-up of standalone version
- You can find the easy how-to from ORCA12_PRACE/README file, which is an instruction document written after PRACE 1IP contribution. To repeat the instruction,
1) cd NEMOGCM/ARCH
2) create a arch-COMPUTER.fcm file in NEMOGCM/ARCH corresponding to your needs. You can refer to 'arch-ifort_linux_curie.fcm' which is tuned for CURIE x86_64 system.
3) cd NEMOGCM/CONFIG
4) ./makenemo -n ORCA12.L75-PRACE -m COMPUTER
Then you will have a subdirectory 'ORCA12.L75-PRACE' is created.
2. Build-up under JuBE benchmark framework
- You shall first download JuBE benchmark suite and PRACE benchmark applications from PRACE SVN. Then you will find 'nemo' benchmark under PABS/applications. Because the old nemo benchmark set has been ill-written and there have been changes on NEMO source, we provide the benchmark setup for the current NEMO version in a separate tarball file (Instruction_for_JuBE.tar.gz). You can follow the instruction specified there for installing and running NEMO v3.4. in the JuBE benchmark suite. ueabs-r1.0/nemo/NEMO_Run_README.txt 0000664 0000000 0000000 00000002773 13335510256 0016745 0 ustar 00root root 0000000 0000000 NEMO_Run_README
Written in 2013-09-12, as a product of NEMO benchmark in PRACE 2IP-WP7.4.
Written by Soon-Heum "Jeff" Ko at Linkoping University, Sweden (sko@nsc.liu.se).
0. Before start
- Follow the instriction in 'NEMO_Build_README.txt' so that you have the directory structure as specified, along with compiled binary:
----- ORCA12_PRACE
|
|------ DATA_CONFIG_ORCA12/
|
|------ FORCING/
|
|------ NEMOGCM/
|
|------ README
|
|------ Instruction_for_JuBE.tar.gz
1. Running standalone version
- After compilation, you will have 'ORCA12.L75-PRACE' directory created under NEMOGCM/CONFIG.
1) cd ORCA12.L75-PRACE/EXP00
2) Link to datasets. Perform follows:
$ ln -s ../../../../DATA_CONFIG_ORCA12/* .
$ ln -s ../../../../FORCING/* .
3) Locate 'namelist' and 'namelist_ice' files in this directory and edit them
4) Run it. It does not have any special command line arguments, thus you can simply type 'mpirun opa'.
2. Running under JuBE benchmark framework
- You can prepare your own XML file to complete from compiling to running at the same time. A file 'ORCA_PRACE_CURIE.xml' under Instruction_for_JuBE.tar.gz could be used as an example. One remark for CURIE user: you shall specify your project ID and which type of queues (standard; large; ...) you are to use. That information is found from 'ccc_myproject' command. ueabs-r1.0/qcd/ 0000775 0000000 0000000 00000000000 13335510256 0013405 5 ustar 00root root 0000000 0000000 ueabs-r1.0/qcd/QCD_Build_README.txt 0000664 0000000 0000000 00000011040 13335510256 0016705 0 ustar 00root root 0000000 0000000 Description and Building of the QCD Benchmark
=============================================
Description
===========
The QCD benchmark is, unlike the other benchmarks in the PRACE
application benchmark suite, not a full application but a set of 5
kernels which are representative of some of the most compute-intensive
parts of QCD calculations.
Test Cases
----------
Each of the 5 kernels has one test case to be used for Tier-0 and
Tier-1:
Kernel A is derived from BQCD (Berlin Quantum ChromoDynamics program),
a hybrid Monte-Carlo code that simulates Quantum Chromodynamics with
dynamical standard Wilson fermions. The computations take place on a
four-dimensional regular grid with periodic boundary conditions. The
kernel is a standard conjugate gradient solver with even/odd
pre-conditioning. Lattice size is 322 x 642.
Kernel B is derived from SU3_AHiggs, a lattice quantum chromodynamics
(QCD) code intended for computing the conditions of the Early
Universe. Instead of "full QCD", the code applies an effective field
theory, which is valid at high temperatures. In the effective theory,
the lattice is 3D. Lattice size is 2563.
Kernel C Lattice size is 84. Note that Kernel C can only be run in a
weak scaling mode, where each CPU stores the same local lattice size,
regardless of the number of CPUs. Ideal scaling for this kernel
therefore corresponds to constant execution time, and performance per
peak TFlop/s is simply the reciprocal of the execution time.
Kernel D consists of the core matrix-vector multiplication routine for
standard Wilson fermions. The lattice size is 644 .
Kernel E consists of a full conjugate gradient solution using Wilson
fermions. Lattice size is 643 x 3.
Building the QCD Benchmark in the JuBE Framework
================================================
The QCD benchmark is integrated in the JuBE Benchmarking Environment
(www.fz-juelich.de/jsc/jube).
JuBE also includes all steps to build the application.
Unpack the QCD_Source_TestCaseA.tar.gz into a directory of your
choice.
After unpacking the Benchmark the following directory structure is available:
PABS/
applications/
bench/
doc/
platform/
skel/
LICENCE
The applications/ subdirectory contains the QCD benchmark
applications.
The bench/ subdirectory contains the benchmark environment scripts.
The doc/ subdirectory contains the overall documentation of the
framework and a tutorial.
The platform/ subdirectory holds the platform definitions as well as
job submission script templates for each defined platform.
The skel/ subdirectory contains templates for analysis patterns for
text output of different measurement tools.
Configuration
-------------
Definition files are already prepared for many platforms. If you are
running on a defined platform just skip this part and go forward to
QCD_Run_README.txt ("Execution").
The platform
------------
A platform is defined through a set of variables in the platform.xml
file, which can be found in the platform/ directory. To create a new
platform entry, copy an existing platform description and modify it to
fit your local setup. The variables defined here will be used by the
individual applications in the later process. Best practice for the
platform nomenclature would be: --. Additionally, you have to create a template batch
submission script, which should be placed in a subdirectory of the
platform/ directory of the same name as the platform itself. Although
this nomenclature is not required by the benchmarking environment, it
helps keeping track of you templates, and minimises the amount of
adaptation necessary for the individual application configurations.
The applications
----------------
Once a platform is defined, each individual application that should be
used in the benchmark (in this case the QCD application) needs to be
configured for this platform. In order to configure an individual
application, copy an existing top-level configuration file
(e.g. prace-scaling-juqueen.xml) to the file prace-.xml.
Then open an editor of your choice, to adapt the file to your
needs. Change the settings of the platform parameter to the name of
your defined platform. The platform name can then be referenced
throughout the benchmarking environment by the $platform variable.
Do the same for compile.xml, execute.xml, analyse.xml.
You can find a step by step tutorial also in doc/JuBETutorial.pdf.
The compilation is part of the run of the application. Please continue
with the QCD_Run_README.txt to finalize the build and to run the
benchmark.
ueabs-r1.0/qcd/QCD_Run_README.txt 0000664 0000000 0000000 00000005061 13335510256 0016420 0 ustar 00root root 0000000 0000000 Running the QCD Benchmarks in the JuBE Framework
================================================
Unpack the QCD_Source_TestCaseA.tar.gz into a directory of your
choice.
After unpacking the Benchmark the following directory structure is available:
PABS/
applications/
bench/
doc/
platform/
skel/
LICENCE
The applications/ subdirectory contains the QCD benchmark
applications.
The bench/ subdirectory contains the benchmark environment scripts.
The doc/ subdirectory contains the overall documentation of the
framework and a tutorial.
The platform/ subdirectory holds the platform definitions as well as
job submission script templates for each defined platform.
The skel/ subdirectory contains templates for analysis patterns for
text output of different measurement tools.
Configuration
=============
Definition files are already prepared for many platforms. If you are
running on a defined platform just go forward, otherwise please have a
look at QCD_Build_README.txt.
Execution
=========
Assuming the Benchmark Suite is installed in a directory that can be
used during execution, a typical run of a benchmark application will
contain two steps.
1. Compiling and submitting the benchmark to the system scheduler.
2. Verifying, analysing and reporting the performance data.
Compiling and submitting
------------------------
If configured correctly, the application benchmark can be compiled and
submitted on the system (e.g. the IBM BlueGene/Q at Jülich) with
the commands:
>> cd PABS/applications/QCD
>> perl ../../bench/jube prace-scaling-juqueen.xml
The benchmarking environment will then compile the binary for all
node/task/thread combinations defined, if those parameters need to be
compiled into the binary. It creates a so-called sandbox subdirectory
for each job, ensuring conflict free operation of the individual
applications at runtime. If any input files are needed, those are
prepared automatically as defined.
Each active benchmark in the application’s top-level configuration
file will receive an ID, which is used as a reference by JUBE later
on.
Verifying, analysing and reporting
----------------------------------
After the benchmark jobs have run, an additional call to jube will
gather the performance data. For this, the options -update and -result
are used.
>> cd DEISA_BENCH/application/QCD
>> perl ../../bench/jube -update -result
The ID is the reference number the benchmarking environment has
assigned to this run. The performance data will then be output to
stdout, and can be post-processed from there.
ueabs-r1.0/quantum_espresso/ 0000775 0000000 0000000 00000000000 13335510256 0016253 5 ustar 00root root 0000000 0000000 ueabs-r1.0/quantum_espresso/QuantumEspresso_Download.txt 0000664 0000000 0000000 00000000205 13335510256 0024016 0 ustar 00root root 0000000 0000000 The Quantum Espresso package can be freely downloaded from the following URL:
http://www.quantum-espresso.org/download/
13/08/2013
ueabs-r1.0/quantum_espresso/QuantumEspresso_Run_README.txt 0000664 0000000 0000000 00000002022 13335510256 0024027 0 ustar 00root root 0000000 0000000 Running The Quantum Espresso Test Cases
---------------------------------------
1. Unpack the tar file containing the input files (command file .in and
pseudopotentials .UPF) in the directory where you want to run the program. For example,
tar zxvf QuantumEspresso_TestCaseA.tar.gz
2. Find the command file cp.in (test cae A) or pw.in (test case B) and check
that the variable pseudo_dir is set to the location of the UPF files, for
example the current directory
pseudo_dir = './'
3. Create a batch file and include in the file the command to launch MPI jobs
(this is system dependent). The benchmark data have been collected varying the
number of MPI tasks only so if the Quantum Espresso version has been compiled
with OpenMP support you should set the number of OpenMP threads to 1. Most
batch scripts will thus contain lines such as:
export OMP_NUM_THREADS=1
mpirun path-to-exectuable/pw.x < cp.in
but check your local documentation.
4. The output including timing information will be sent to standard output.
Cineca 13/08/2013
ueabs-r1.0/specfem3d/ 0000775 0000000 0000000 00000000000 13335510256 0014507 5 ustar 00root root 0000000 0000000 ueabs-r1.0/specfem3d/SPECFEM3D_Build_README.txt 0000664 0000000 0000000 00000000773 13335510256 0020664 0 ustar 00root root 0000000 0000000 on CURIE :
flags.guess :
DEF_FFLAGS="-O3 -DFORCE_VECTORIZATION -check nobounds -xHost -ftz
-assume buffered_io -assume byterecl -align sequence -vec-report0 -std03
-diag-disable 6477 -implicitnone -warn truncated_source -warn
argument_checking -warn unused -warn declarations -warn alignments -warn
ignore_loc -warn usage -mcmodel=medium -shared-intel"
configure command :
./configure MPIFC=mpif90 FC=ifort CC=icc CFLAGS="-mcmodel=medium
-shared-intel" CPP=cpp
in order to compile :
make clean
make all
ueabs-r1.0/specfem3d/SPECFEM3D_Run_README.txt 0000664 0000000 0000000 00000000574 13335510256 0020370 0 ustar 00root root 0000000 0000000 To run the test cases, copy the Par_file, STATIONS and CMTSOLUTION files into the SPECFEM3D_GLOBE/DATA directory.
Recompile the mesher and the solver.
Run the mesher and the solver.
On Curie the commands to put in the submission file are :
ccc_mprun bin/xmeshfem3D
ccc_mprun bin/xspecfem3D
SPECFEM3D_TestCaseA runs on 864 cores and SPECFEM3D_TestCaseB runs on 11616 cores.