pax_global_header 0000666 0000000 0000000 00000000064 13303521705 0014511 g ustar 00root root 0000000 0000000 52 comment=d5c40bfe25a41e002e45aa6aacde539e57adec21
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/ 0000775 0000000 0000000 00000000000 13303521705 0020140 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/README.md 0000664 0000000 0000000 00000067376 13303521705 0021442 0 ustar 00root root 0000000 0000000 # Unified European Applications Benchmark Suite, version 1.2
The Unified European Application Benchmark Suite (UEABS) is a set of 12 application codes taken from the pre-existing PRACE and DEISA application benchmark suites to form a single suite, with the objective of providing a set of scalable, currently relevant and publically available codes and datasets, of a size which can realistically be run on large systems, and maintained into the future. This work has been undertaken by Task 7.4 "Unified European Applications Benchmark Suite for Tier-0 and Tier-1" in the PRACE Second Implementation Phase (PRACE-2IP) project and will be updated and maintained by subsequent PRACE Implementation Phase projects.
For more details of the codes and datasets, and sample results, please see http://www.prace-ri.eu/IMG/pdf/d7.4_3ip.pdf
Release notes for version 1.2, released on 31st October 2016 as a result of PRACE-4IP activities.
Changes from version 1.1 are as follows:
GENE: new version of code and additional new dataset.
GPAW: new version of code and new dataset.
GROMACS: new version of code and updated dataset.
NAMD: new version of code and minor build and run instructions updates.
NEMO: new version of code and replaced dataset.
Release notes for version 1.1, released on 31st May 2014 as a resut of PRACE-3IP activities.
Changes from version 1.0 are as follows:
ALYA: new version of code and new datasets.
Code_Saturne: additional large dataset, using tetrahedralelements.
CP2K: new build instructions.
GPAW: new dataset with reduced runtime.
The codes composing the UEABS are:
- [ALYA](#alya)
- [Code_Saturne](#saturne)
- [CP2K](#cp2k)
- [GADGET](#gadget)
- [GENE](#gene)
- [GPAW](#gpaw)
- [GROMACS](#gromacs)
- [NAMD](#namd)
- [NEMO](#nemo)
- [QCD](#qcd)
- [Quantum Espresso](#espresso)
- [SPECFEM3D](#specfem3d)
# ALYA
The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP.
- Web site: https://www.bsc.es/computer-applications/alya-system
- Code download: http://www.prace-ri.eu/UEABS/ALYA/1.1/alya3226.tar.gz
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/alya/ALYA_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/ALYA/ALYA_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/ALYA/ALYA_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/alya/ALYA_Run_README.txt
# Code_Saturne
Code_Saturne® is a multipurpose Computational Fluid Dynamics (CFD) software package, which has been developed by EDF (France) since 1997. The code was originally designed for industrial applications and research activities in several fields related to energy production; typical examples include nuclear power thermal-hydraulics, gas and coal combustion, turbo-machinery, heating, ventilation, and air conditioning. In 2007, EDF released the code as open-source and this provides both industry and academia to benefit from its extensive pedigree. Code_Saturne®’s open-source status allows for answers to specific needs that cannot easily be made available in commercial “black box” packages. It also makes it possible for industrial users and for their subcontractors to develop and maintain their own independent expertise and to fully control the software they use.
Code_Saturne® is based on a co-located finite volume approach that can handle three-dimensional meshes built with any type of cell (tetrahedral, hexahedral, prismatic, pyramidal, polyhedral) and with any type of grid structure (unstructured, block structured, hybrid). The code is able to simulate either incompressible or compressible flows, with or without heat transfer, and has a variety of models to account for turbulence. Dedicated modules are available for specific physics such as radiative heat transfer, combustion (e.g. with gas, coal and heavy fuel oil), magneto-hydro dynamics, and compressible flows, two-phase flows. The software comprises of around 350 000 lines of source code, with about 37% written in Fortran90, 50% in C and 15% in Python. The code is parallelised using MPI with some OpenMP.
- Web site: http://code-saturne.org
- Code download: http://code-saturne.org/cms/download or http://www.prace-ri.eu/UEABS/Code_Saturne/Code_Saturne_Source_3.0.1.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: http://code-saturne.org/cms/documentation/guides/installation
- Test Case A: http://www.prace-ri.eu/UEABS/Code_Saturne/Code_Saturne_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/Code_Saturne/1.1/Code_Saturne_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/code_saturne/Code_Saturne_README.txt
# CP2K
CP2K is a freely available (GPL) program to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials. It is very well and consistently written, standards-conforming Fortran 95, parallelized with MPI and in some parts with hybrid OpenMP+MPI as an option.
CP2K provides state-of-the-art methods for efficient and accurate atomistic simulations, sources are freely available and actively improved. It has an active international development team, with the unofficial head quarters in the University of Zürich.
- Web site: https://www.cp2k.org/
- Code download: https://www.cp2k.org/download
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/cp2k/CP2K_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/CP2K/CP2K_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/CP2K/CP2K_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/cp2k/CP2K_Run_README.txt
# GADGET
GADGET is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany. GADGET is written in C and uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs. GADGET computes gravitational forces with a hierarchical tree algorithm (optionally in combination with a particle-mesh scheme for long-range gravitational forces) and represents fluids by means of smoothed particle hydrodynamics (SPH). The code can be used for studies of isolated systems, or for simulations that include the cosmological expansion of space, either with, or without, periodic boundary conditions. In all these types of simulations, GADGET follows the evolution of a self-gravitating collisionless N-body system, and allows gas dynamics to be optionally included. Both the force computation and the time stepping of GADGET are fully adaptive, with a dynamic range that is, in principle, unlimited. GADGET can therefore be used to address a wide array of astrophysics interesting problems, ranging from colliding and merging galaxies, to the formation of large-scale structure in the Universe. With the inclusion of additional physical processes such as radiative cooling and heating, GADGET can also be used to study the dynamics of the gaseous intergalactic medium, or to address star formation and its regulation by feedback processes.
- Web site: http://www.mpa-garching.mpg.de/gadget/
- Code download: http://www.prace-ri.eu/UEABS/GADGET/gadget3_Source.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/gadget/gadget3_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/GADGET/gadget3_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/gadget/gadget3_Run_README.txt
# GENE
GENE is a gyro kinetic plasma turbulence code which has been developed since the late 1990’s and is physically very comprehensive and flexible as well as computationally very efficient and highly scalable. Originally used for flux-tube simulations, today GENE also operates as a global code, either gradient- or flux-driven. An arbitrary number of gyro kinetic particle species can be taken into account, including electromagnetic effects and collisions. GENE is, in principle, able to cover the widest possible range of scales, all the way from the system size (where nonlocal effects or avalanches can play a role) down to sub-ion-gyroradius scales (where ETG or micro tearing modes may contribute to the transport), depending on the available computer resources. Moreover, there exist interfaces to various MHD equilibrium codes. GENE has been carefully benchmarked against theoretical results and other codes.
The GENE code is written in Fortran 90 and C and is parallelized with pure MPI. It strongly relies on a Fast Fourier Transform library and has built-in support for FFTW, MKL or ESSL. It also uses LAPACK and ScaLapack routines for LU decomposition and solution of a linear system of equations of moderate size (up to 1000 unknowns).
- Web site: http://gene.rzg.mpg.de/
- Code download: http://www.prace-ri.eu/UEABS/GENE/1.2/GENE.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: included with code download
- Test Case A: included with code download
- Test Case B: included with code download
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/gene/GENE_Run_README.txt
# GPAW
GPAW is an efficient program package for electronic structure calculations based on the density functional theory (DFT) and the time-dependent density functional theory (TD-DFT). The density-functional theory allows studies of ground state properties such as energetics and equilibrium geometries, while the time-dependent density functional theory can be used for calculating excited state properties such as optical spectra. The program package includes two complementary implementations of time-dependent density functional theory: a linear response formalism and a time-propagation in real time.
The program uses the projector augmented wave (PAW) method that allows one to get rid of the core electrons and work with soft pseudo valence wave functions. The PAW method can be applied on the same footing to all elements, for example, it provides a reliable description of the transition metal elements and the first row elements with open p-shells that are often problematic for standard pseudopotentials. A further advantage of the PAW method is that it is an all-electron method (frozen core approximation) and there is a one to one transformation between the pseudo and all-electron quantities.
The equations of the (time-dependent) density functional theory within the PAW method are discretized using finite-differences and uniform real-space grids. The real-space representation allows flexible boundary conditions, as the system can be finite or periodic in one, two or three dimensions (e.g. cluster, slab, bulk). The accuracy of the discretization is controlled basically by single parameter, the grid spacing. The real-space representation allows also efficient parallelization with domain decomposition.
The program offers several parallelization levels. The most basic parallelization strategy is domain decomposition over the real-space grid. In magnetic systems it is possible to parallelize over spin, and in systems that have k-points (surfaces or bulk systems) parallelization over k-points is also possible. Furthermore, parallelization over electronic states is possible in DFT and in real-time TD-DFT calculations. GPAW is written in Python and C and parallelized with MPI.
- Web site: https://wiki.fysik.dtu.dk/gpaw/
- Code download: https://gitlab.com/gpaw/gpaw
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/gpaw/GPAW_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/GPAW/GPAW_benchmark.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/gpaw/GPAW_Run_README.txt
# GROMACS
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
GROMACS supports all the usual algorithms you expect from a modern molecular dynamics implementation, (check the online reference or manual for details), but there are also quite a few features that make it stand out from the competition:
- GROMACS provides extremely high performance compared to all other programs. A lot of algorithmic optimizations have been introduced in the code; we have for instance extracted the calculation of the virial from the innermost loops over pairwise interactions, and we use our own software routines to calculate the inverse square root. In GROMACS 4.6 and up, on almost all common computing platforms, the innermost loops are written in C using intrinsic functions that the compiler transforms to SIMD machine instructions, to utilize the available instruction-level parallelism. These kernels are available in either single and double precision, and in support all the different kinds of SIMD support found in x86-family (and other) processors.
- Also since GROMACS 4.6, we have excellent CUDA-based GPU acceleration on GPUs that have Nvidia compute capability >= 2.0 (e.g. Fermi or later)
- GROMACS is user-friendly, with topologies and parameter files written in clear text format. There is a lot of consistency checking, and clear error messages are issued when something is wrong. Since a C preprocessor is used, you can have conditional parts in your topologies and include other files. You can even compress most files and GROMACS will automatically pipe them through gzip upon reading.
- There is no scripting language – all programs use a simple interface with command line options for input and output files. You can always get help on the options by using the -h option, or use the extensive manuals provided free of charge in electronic or paper format.
- As the simulation is proceeding, GROMACS will continuously tell you how far it has come, and what time and date it expects to be finished.
- Both run input files and trajectories are independent of hardware endian-ness, and can thus be read by any version GROMACS, even if it was compiled using a different floating-point precision.
- GROMACS can write coordinates using lossy compression, which provides a very compact way of storing trajectory data. The accuracy can be selected by the user.
- GROMACS comes with a large selection of flexible tools for trajectory analysis – you won’t have to write any code to perform routine analyses. The output is further provided in the form of finished Xmgr/Grace graphs, with axis labels, legends, etc. already in place!
- A basic trajectory viewer that only requires standard X libraries is included, and several external visualization tools can read the GROMACS file formats.
- GROMACS can be run in parallel, using either the standard MPI communication protocol, or via our own “Thread MPI” library for single-node workstations.
- GROMACS contains several state-of-the-art algorithms that make it possible to extend the time steps is simulations significantly, and thereby further enhance performance without sacrificing accuracy or detail.
- The package includes a fully automated topology builder for proteins, even multimeric structures. Building blocks are available for the 20 standard aminoacid residues as well as some modified ones, the 4 nucleotide and 4 deoxinucleotide resides, several sugars and lipids, and some special groups like hemes and several small molecules.
- There is ongoing development to extend GROMACS with interfaces both to Quantum Chemistry and Bioinformatics/databases.
- GROMACS is Free Software, available under the GNU Lesser General Public License (LGPL), version 2.1. You can redistribute it and/or modify it under the terms of the LGPL as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
Instructions:
- Web site: http://www.gromacs.org/
- Code download: http://www.gromacs.org/Downloads The UEABS benchmark cases require the use of 5.1.x or newer branch: the latest 2016 version is suggested.
- Test Case A: http://www.prace-ri.eu/UEABS/GROMACS/1.2/GROMACS_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/GROMACS/1.2/GROMACS_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/gromacs/GROMACS_Run_README.txt
# NAMD
NAMD is a widely used molecular dynamics application designed to simulate bio-molecular systems on a wide variety of compute platforms. NAMD is developed by the “Theoretical and Computational Biophysics Group” at the University of Illinois at Urbana Champaign. In the design of NAMD particular emphasis has been placed on scalability when utilizing a large number of processors. The application can read a wide variety of different file formats, for example force fields, protein structure, which are commonly used in bio-molecular science.
A NAMD license can be applied for on the developer’s website free of charge. Once the license has been obtained, binaries for a number of platforms and the source can be downloaded from the website.
Deployment areas of NAMD include pharmaceutical research by academic and industrial users. NAMD is particularly suitable when the interaction between a number of proteins or between proteins and other chemical substances is of interest. Typical examples are vaccine research and transport processes through cell membrane proteins.
NAMD is written in C++ and parallelised using Charm++ parallel objects, which are implemented on top of MPI.
- Web site: http://www.ks.uiuc.edu/Research/namd/
- Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/namd/NAMD_Download_README.txt
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/namd/NAMD_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/NAMD/NAMD_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/NAMD/NAMD_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/namd/NAMD_Run_README.txt
# NEMO
NEMO (Nucleus for European Modeling of the Ocean) is a state-of-the-art modeling framework for oceanographic research, operational oceanography seasonal forecast and climate studies. Prognostic variables are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity. In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid. Within NEMO, the ocean is interfaced with a sea-ice model (LIM v2 and v3), passive tracer and biogeochemical models (TOP) and, via the OASIS coupler, with several atmospheric general circulation models. It also supports two-way grid embedding via the AGRIF software.
The framework includes five major components:
- the blue ocean (ocean dynamics, NEMO-OPA)
- the white ocean (sea-ice, NEMO-LIM)
- the green ocean (biogeochemistry, NEMO-TOP)
- the adaptive mesh refinement software (AGRIF)
- the assimilation component (NEMO_TAM)
NEMO is used by a large community: 240 projects in 27 countries (14 in Europe, 13 elsewhere) and 350 registered users (numbers for year 2008). The code is available under the CeCILL license (public license). The latest stable version is 3.6. NEMO is written in Fortran90 and parallelized with MPI.
- Web site: http://www.nemo-ocean.eu/
- Code download: http://www.prace-ri.eu/UEABS/NEMO/NEMO_Source.tar.gz
- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license.
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/nemo/NEMO_Build_README.txt
- Test Case A: http://www.prace-ri.eu/UEABS/NEMO/NEMO_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/nemo/NEMO_Run_README.txt
# QCD
The QCD benchmark is, unlike the other benchmarks in the PRACE application benchmark suite, not a full application but a set of 5 kernels which are representative of some of the most compute-intensive parts of QCD calculations.
Each of the 5 kernels has one test case:
Kernel A is derived from BQCD (Berlin Quantum ChromoDynamics program), a hybrid Monte-Carlo code that simulates Quantum Chromodynamics with dynamical standard Wilson fermions. The computations take place on a four-dimensional regular grid with periodic boundary conditions. The kernel is a standard conjugate gradient solver with even/odd pre-conditioning. Lattice size is 322 x 642.
Kernel B is derived from SU3_AHiggs, a lattice quantum chromodynamics (QCD) code intended for computing the conditions of the Early Universe. Instead of “full QCD”, the code applies an effective field theory, which is valid at high temperatures. In the effective theory, the lattice is 3D. Lattice size is 2563.
Kernel C Lattice size is 84. Note that Kernel C can only be run in a weak scaling mode, where each CPU stores the same local lattice size, regardless of the number of CPUs. Ideal scaling for this kernel therefore corresponds to constant execution time, and performance is simply the reciprocal of the execution time.
Kernel D consists of the core matrix-vector multiplication routine for standard Wilson fermions. The lattice size is 644 .
Kernel E consists of a full conjugate gradient solution using Wilson fermions. Lattice size is 643 x 3.
- Code download: http://www.prace-ri.eu/UEABS/QCD/QCD_Source_TestCaseA.tar.gz
- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/qcd/QCD_Build_README.txt
- Test Case A: included with source download
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/qcd/QCD_Run_README.txt
# Quantum Espresso
QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). QUANTUM ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and inter-operable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.
QUANTUM ESPRESSO is written mostly in Fortran90, and parallelised using MPI and OpenMP.
- Web site: http://www.quantum-espresso.org/
- Code download: http://www.quantum-espresso.org/download/
- Build instructions: http://www.quantum-espresso.org/wp-content/uploads/Doc/user_guide/
- Test Case A: http://www.prace-ri.eu/UEABS/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/Quantum_Espresso/QuantumEspresso_TestCaseB.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/quantum_espresso/QuantumEspresso_Run_README.txt
# SPECFEM3D
The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). All SPECFEM3D_GLOBE software is written in Fortran90 with full portability in mind, and conforms strictly to the Fortran95 standard. It uses no obsolete or obsolescent features of Fortran77. The package uses parallel programming based upon the Message Passing Interface (MPI).
The SEM was originally developed in computational fluid dynamics and has been successfully adapted to address problems in seismic wave propagation. It is a continuous Galerkin technique, which can easily be made discontinuous; it is then close to a particular case of the discontinuous Galerkin technique, with optimized efficiency because of its tensorized basis functions. In particular, it can accurately handle very distorted mesh elements. It has very good accuracy and convergence properties. The spectral element approach admits spectral rates of convergence and allows exploiting hp-convergence schemes. It is also very well suited to parallel implementation on very large supercomputers as well as on clusters of GPU accelerating graphics cards. Tensor products inside each element can be optimized to reach very high efficiency, and mesh point and element numbering can be optimized to reduce processor cache misses and improve cache reuse. The SEM can also handle triangular (in 2D) or tetrahedral (3D) elements as well as mixed meshes, although with increased cost and reduced accuracy in these elements, as in the discontinuous Galerkin method.
In many geological models in the context of seismic wave propagation studies (except for instance for fault dynamic rupture studies, in which very high frequencies of supershear rupture need to be modeled near the fault, a continuous formulation is sufficient because material property contrasts are not drastic and thus conforming mesh doubling bricks can efficiently handle mesh size variations. This is particularly true at the scale of the full Earth. Effects due to lateral variations in compressional-wave speed, shear-wave speed, density, a 3D crustal model, ellipticity, topography and bathyletry, the oceans, rotation, and self-gravitation are included. The package can accommodate full 21-parameter anisotropy as well as lateral variations in attenuation. Adjoint capabilities and finite-frequency kernel simulations are also included.
- Web site: http://geodynamics.org/cig/software/specfem3d_globe/
- Code download: http://geodynamics.org/cig/software/specfem3d_globe/
- Build instructions: http://www.geodynamics.org/wsvn/cig/seismo/3D/SPECFEM3D_GLOBE/trunk/doc/USER_MANUAL/manual_SPECFEM3D_GLOBE.pdf?op=file&rev=0&sc=0
- Test Case A: http://www.prace-ri.eu/UEABS/SPECFEM3D/SPECFEM3D_TestCaseA.tar.gz
- Test Case B: http://www.prace-ri.eu/UEABS/SPECFEM3D/SPECFEM3D_TestCaseA.tar.gz
- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/v1.0/specfem3d/SPECFEM3D_Run_README.txt
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/alya/ 0000775 0000000 0000000 00000000000 13303521705 0021066 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/alya/ALYA_Build_README.txt 0000664 0000000 0000000 00000000540 13303521705 0024510 0 ustar 00root root 0000000 0000000 In order to build ALYA (Alya.x), please follow these steps:
- Go to: Thirdparties/metis-4.0 and build the Metis library (libmetis.a) using 'make'
- Go to the directory: Executables/unix
- Adapt the file: configure-marenostrum-mpi.txt to your own MPI wrappers and paths
- Execute:
./configure -x -f=configure-marenostrum-mpi.txt nastin parall
make
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/alya/ALYA_Run_README.txt 0000664 0000000 0000000 00000003625 13303521705 0024224 0 ustar 00root root 0000000 0000000 Data sets
---------
The parameters used in the datasets try to represent at best typical industrial runs in order to obtain representative speedups. For example, the iterative solvers
are never converged to machine accuracy, as the system solution is inside a non-linear loop.
The datasets represent the solution of the cavity flow at Re=100. A small mesh of 10M elements should be used for Tier-1 supercomputers while a 30M element mesh
is specifically designed to run on Tier-0 supercomputers.
However, the number of elements can be multiplied by using the mesh multiplication option in the file *.ker.dat (DIVISION=0,2,3...). The mesh multiplication is
carried out in parallel and the numebr of elements is multiplied by 8 at each of these levels. "0" means no mesh multiplication.
The different datasets are:
cavity10_tetra ... 10M tetrahedra mesh
cavity30_tetra ... 30M tetrahedra mesh
How to execute Alya with a given dataset
----------------------------------------
In order to run ALYA, you need at least the following input files per execution:
X.dom.dat
X.typ.dat
X.geo.dat
X.bcs.dat
X.inflow_profile.bcs
X.ker.dat
X.nsi.dat
X.dat
In our case, there are 2 different inputs, so X={cavity10_tetra,cavity30_tetra}
To execute a simulation, you must be inside the input directory and you should submit a job like:
mpirun Alya.x cavity10_tetra
or
mpirun Alya.x cavity30_tetra
How to measure the speedup
--------------------------
1. Edit the fensap.nsi.cvg file
2. You will see ten rows, each one corresponds to one simulation timestep
3. Go to the second row, it starts with a number 2
4. Get the last number of this row, that corresponds to the elapsed CPU time of this timestep
5. Use this value in order to measure the speedup
Contact
-------
If you have any question regarding the runs, please feel free to contact Guillaume Houzeaux: guillaume.houzeaux@bsc.es
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/code_saturne/ 0000775 0000000 0000000 00000000000 13303521705 0022613 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/code_saturne/Code_Saturne_README.txt 0000664 0000000 0000000 00000001334 13303521705 0026745 0 ustar 00root root 0000000 0000000 Running a case is described in the following page:
http://www.code-saturne.org
Two test cases are available, the former using an hexa-based grid
and the latter a tetra-based grid.
Test Case A deals with the flow in a bundle of tubes.
A larger mesh (51M cells) is built from an original mesh of 13M cells.
The original mesh_input file (already preprocessed for Code_Saturne)
is to be found under MESH.
The user subroutines are under XE6_INTERLAGOS/SRC
The test case has been set up to run for 10 time-steps.
Test Case B models a lid-driven cavity and the cells are all tetras.
The total number of cells is about 110M.
The mesh is called mesh_input and the usersubroutines are available under SRC_UEABS
10 time-steps are run.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/cp2k/ 0000775 0000000 0000000 00000000000 13303521705 0020777 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/cp2k/CP2K_Build_README.txt 0000664 0000000 0000000 00000005667 13303521705 0024411 0 ustar 00root root 0000000 0000000 Build instructions for CP2K.
2014-04-09 : ntell@iasa.gr
CP2K needs a number of external libraries and a threads enabled MPI implementation.
These are : BLAS/LAPACK, BLACS/SCALAPACK, LIBINT, FFTW3.
It is advised to use the vendor optimized versions of these libraries.
If some of these are not available on your machine,
there some implementations of these libraries. Some of these are below.
1. BLAS/LAPACK :
netlib BLAS/LAPACK : http://netlib.org/lapack/
ATLAS : http://math-atlas.sf.net/
GotoBLAS : http://www.tacc.utexas.edu/tacc-projects
MKL : refer to your Intel MKL installation, if available
ACML : refer to your ACML installation if available
2. BLACS/SCALAPACK : http://netlib.org/scalapack/
Intel BLACS/SCALAPACK Implementation
3. LIBINT : http://sourceforge.net/projects/libint/files/v1-releases/
4. FFTW3 : http://www.fftw.org/
In the directory cp2k-VERSION/arch there are some ARCH files with instructions how
to build CP2K. For each architecture/compiler there are few arch files describing how to build cp2k.
Select one of the .psmp files that fits your architecture/compiler.
cd to cp2k-VERSION/makefiles
If the arch file for your machine is called SOMEARCH_SOMECOMPILER.psmp,
issue : make ARCH=SOMEARCH_SOMECOMPILER VERSION=psmp
If everything goes fine, you'll find the executable cp2k.psmp in the directory
cp2k-VERSION/exe/SOMEARCH_SOMECOMPILER
In most cases you need to create a custom arch file that fits cpu type,
compiler, and the installation path of external libraries.
As an example below is the arch file for a machine with mpif90/gcc/gfortran, that supports SSE2, has
all the external libraries installed under /usr/local/, uses ATLAS with full
LAPACK support for BLAS/LAPACK, Scalapack-2 for BLACS/Scalapack, fftw3 FFTW3 and libint-1.1.4:
#=======================================================================================================
CC = gcc
CPP =
FC = mpif90 -fopenmp
LD = mpif90 -fopenmp
AR = ar -r
DFLAGS = -D__GFORTRAN -D__FFTSG -D__parallel -D__BLACS -D__SCALAPACK -D__FFTW3 -D__LIBINT -I/usr/local/fftw3/include -I/usr/local/libint-1.1.4/include
CPPFLAGS =
FCFLAGS = $(DFLAGS) -O3 -msse2 -funroll-loops -finline -ffree-form
FCFLAGS2 = $(DFLAGS) -O3 -msse2 -funroll-loops -finline -ffree-form
LDFLAGS = $(FCFLAGS)
LIBS = /usr/local/Scalapack/lib/libscalapack.a \
/usr/local/Atlas/lib/liblapack.a \
/usr/local/Atlas/lib/libf77blas.a \
/usr/local/Atlas/lib/libcblas.a \
/usr/local/Atlas/lib/libatlas.a \
/usr/local/fftw3/lib/libfftw3_threads.a \
/usr/local/fftw3/lib/libfftw3.a \
/usr/local/libint-1.1.4/lib/libderiv.a \
/usr/local/libint-1.1.4/lib/libint.a \
-lstdc++ -lpthread
OBJECTS_ARCHITECTURE = machine_gfortran.o
#=======================================================================================================
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/cp2k/CP2K_Download_README.txt 0000664 0000000 0000000 00000001047 13303521705 0025105 0 ustar 00root root 0000000 0000000 CP2K can be downloaded from : http://www.cp2k.org/download
It is free for all users under GPL license,
see Obtaining CP2K section in the download page.
In UEABS(2IP) the 2.3 branch was used that can be downloaded from :
http://sourceforge.net/projects/cp2k/files/cp2k-2.3.tar.bz2
Data files are compatible with at least 2.4 branch.
Tier-0 data set requires the libint-1.1.4 library. If libint version 1
is not available on your machine, libint can be downloaded from :
http://sourceforge.net/projects/libint/files/v1-releases/libint-1.1.4.tar.gz
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/cp2k/CP2K_Run_README.txt 0000664 0000000 0000000 00000002761 13303521705 0024106 0 ustar 00root root 0000000 0000000 Run instructions for CP2K.
2013-08-13 : ntell@iasa.gr
After build of hybrid MPI/OMP CP2K you have an executable called cp2k.psmp.
You can try any combination of TASKSPERNODE/THREADSPERTASK.
The input file is H2O-1024.inp for tier-1 and input_bulk_HFX_3.inp for tier-0 systems.
For tier-1 systems the best performance is usually obtained with pure MPI,
while for tier-0 systems the best performance is obtained using 1 MPI task per
node with the number of threads/MPI_Task being equal to the number of
cores/node.
Tier-0 case requires a converged wavefunction file, that can be obtained
running with any number of cores, 1024-2048 cores are suggested :
WRAPPER WRAPPER_OPTIONS PATH_TO_cp2k.psmp -i input_bulk_B88_3.inp -o input_bulk_B88_3.log
When this run finish, mv the saved restart file LiH_bulk_3-RESTART.wfn to
B88.wfn
The general way to run is :
WRAPPER WRAPPER_OPTIONS PATH_TO_cp2k.psmp -i inputfile -o logile
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The run walltime is reported near the end of logfile : grep ^\ CP2K\^ logile | tail -1 | awk -F ' ' '{print $7}'
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gadget/ 0000775 0000000 0000000 00000000000 13303521705 0021373 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gadget/gadget3_Build_README.txt 0000664 0000000 0000000 00000000520 13303521705 0025603 0 ustar 00root root 0000000 0000000 1. Install FFTW-2, available at http://www.fftw.org
2. Install GSL, availavle at http://www.gnu.org/software/gsl
3. Install HDF5, availavle at http://www.hdfgroup.org/HDF5/
4. Go to Gadget3/
5. Edit Makefile, set:
CC
CXX
GSL_INCL
GSL_LIBS
FFTW_INCL
FFTW_LIBS
HDF5INCL
HDF5LIB
6. make CONFIG=Config-Medium.sh
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gadget/gadget3_Run_README.txt 0000664 0000000 0000000 00000000345 13303521705 0025315 0 ustar 00root root 0000000 0000000 1. Creation of input
mpirun -np 128 ./N-GenIC ics_medium.param
ics_medium.param is in N-GenIC directory
2. Run calculation
mpirun -np 128 ./Gadget3 param-medium.txt
param-medium.txt is in Gadget3 directory
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gene/ 0000775 0000000 0000000 00000000000 13303521705 0021056 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gene/GENE_Run_README.txt 0000664 0000000 0000000 00000005572 13303521705 0024207 0 ustar 00root root 0000000 0000000 This is the README file for the GENE application benchmark,
distributed with the Unified European Application Benchmark Suite.
-----------
GENE readme
-----------
Contents
--------
1. General description
2. Code structure
3. Parallelization
4. Building
5. Execution
6. Data
1. General description
======================
The gyrokinetic plasma turbulence code GENE (this acronym stands for
Gyrokinetic Electromagnetic Numerical Experiment) is a software package
dedicated to solving the nonlinear gyrokinetic Integro-Differential system
of equations in either flux-tube domain or in a radially nonlocal domain.
GENE has been developed by a team of people (the Gene Development Team,
led by F. Jenko, Max-Planck-Institut for Plasma Physics) over the last
several years.
For further documentation of the code see: http://www.ipp.mpg.de/~fsj/gene/
2. Code structure
==================
Each particle species is described by a time-dependent distribution function
in a five-dimensional phase space.
This results in 6 dimensional arrays, which have the following coordinates:
x y z three space coordinates
v parallel velocity
w perpendicular velocity
spec species of particles
GENE is written completely in FORTRAN90, with some language structures
from Fortran 2003 standard. It also contains preprocessing directives.
3. Parallelization
==================
Parallelization is done by domain decomposition of all 6 coordinates using MPI.
x, y, z 3 space coordinates
v parallel velocity
w perpendicular velocity
spec species of particles
4. Building
===========
The source code (fortran-90) resides in directory src.
The compilation of GENE will be done by JuBE.
Compilation will be done automatically if a new executable for the
benchmark runs is needed.
5. Running the code
====================
A very brief description of the datasets:
parameters_small
A small data set for test purposes. Needs only 8 cores to run.
parameters_tier1
Global simulation of ion-scale turbulence in Asdex-Upgrade,
needs 200-500GB total memory, runs from 256 to 4096 cores
parameters_tier0
Global simulation of ion-scale turbulence in JET,
needs 3.5-7TB total memory, runs from 4096 to 16384 cores
For running the benchmark for GENE, please follow the instructions for
using JuBE.
JuBE generates for each benchmark run a run directory and generates from
a template input file the input file 'parameters' and stores it in the
run directory.
A job submit script is created as well and is submitted.
6. Data
=======
The only input file is 'parameters'. It has the format of a f90 namelist.
The following output files are stored in the run directory.
nrg.dat The content of this file is used to verify the correctness
of the benchmark run.
stdout is redirected by JuBE.
It contains logging information,
especially the result of the time measurement.
--------------------------------------------------------------------------
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gpaw/ 0000775 0000000 0000000 00000000000 13303521705 0021076 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gpaw/GPAW_Build_README.txt 0000664 0000000 0000000 00000003150 13303521705 0024530 0 ustar 00root root 0000000 0000000 Instructions for obtaining GPAW and its test set for PRACE benchmarking
GPAW is licensed under GPL, so there are no license issues
NOTE: This benchmark uses 0.11 version of GPAW. For instructions for installing the
latest version, please visit:
https://wiki.fysik.dtu.dk/gpaw/install.html
Software requirements
=====================
* Python
* version 2.6-3.5 required
* this benchmark uses version 2.7.9
* NumPy
* this benchmark uses version 1.11.0
* ASE (Atomic Simulation Environment)
* this benchmark uses 3.9.0
* LibXC
* this benchmark uses version 2.0.1
* BLAS and LAPACK libraries
* this benchmark uses Intel MKL from Intel Composer Studio 2015
* MPI library (optional, for increased performance using parallel processes)
* this benchmark uses Intel MPI from Intel Composer Studio 2015
* FFTW (optional, for increased performance)
* this benchmark uses Intel MKL from Intel Composer Studio 2015
* BLACS and ScaLAPACK (optional, for increased performance)
* this benchmark uses Intel MKL from Intel Composer Studio 2015
* HDF5 (optional, library for parallel I/O and for saving files in HDF5 format)
* this benchmark uses 1.8.14
Obtaining the source code
=========================
* The specific version of GPAW used in this benchmark can be obtained from:
https://gitlab.com/gpaw/gpaw/tags/0.11.0
* Installation instructions can be found at:
https://wiki.fysik.dtu.dk/gpaw/install.html
* For platform specific instructions, please refer to:
https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html
Support
=======
* Help regarding the benchmark can be requested from adem.tekin@be.itu.edu.tr
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gpaw/GPAW_Run_README.txt 0000664 0000000 0000000 00000003353 13303521705 0024242 0 ustar 00root root 0000000 0000000 This benchmark set contains scaling tests for electronic structure simulation software GPAW.
More information on GPAW can be found at https://wiki.fysik.dtu.dk/gpaw
Small Scaling Test: carbone_nanotube.py
=======================================
A ground state calculation for (6-6-10) carbon nanotube, requiring 30 SCF iterations.
The calculations under ScaLAPACK are parallelized under 4/4/64 partitioning scheme.
This systems scales reasonably up to 512 cores, running to completion under two minutes on a 2015 era x86 architecture cluster.
For scalability testing, the relevant timer in the text output 'out_nanotube_hXXX_kYYY_pZZZ' (where XXX denotes grid spacing, YYY denotes Brillouin-zone sampling and ZZZ denotes number of cores utilized) is 'Total Time'.
Medium Scaling Test: C60_Pb100.py and C60_Pb100_POSCAR
======================================================
A ground state calculation for Fullerene on Pb 100 Surface, requiring ~100 SCF iterations.
In this example, the parameters of the parallelization scheme for ScaLAPACK calculations are chosen automatically (using the keyword 'sl_auto: True').
This systems scales reasonably up to 1024 cores, running to completion under thirteen minutes on a 2015 era x86 architecture cluster.
For scalability testing, the relevant timer in the text output 'out_C60_Pb100_hXXX_kYYY_pZZZ' (where XXX denotes grid spacing, YYY denotes Brillouin-zone sampling and ZZZ denotes number of cores utilized) is 'Total Time'.
How to run
==========
* Download and build the source code along the instructions in GPAW_Build_README.txt
* Benchmarks do not need any special command line options and can be run
just as e.g. :
mpirun -np 256 gpaw-python carbone_nanotube.py
mpirun -np 512 gpaw-python C60_Pb100.py
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gromacs/ 0000775 0000000 0000000 00000000000 13303521705 0021573 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gromacs/GROMACS_Build_README.txt 0000664 0000000 0000000 00000002311 13303521705 0025560 0 ustar 00root root 0000000 0000000 Complete Build instructions: http://manual.gromacs.org/documentation/
A typical build procedure look like :
tar -zxf gromacs-2016.tar.gz
cd gromacs-2016
mkdir build
cd build
cmake \
-DCMAKE_INSTALL_PREFIX=$HOME/Packages/gromacs/2016 \
-DBUILD_SHARED_LIBS=off \
-DBUILD_TESTING=off \
-DREGRESSIONTEST_DOWNLOAD=OFF \
-DCMAKE_C_COMPILER=`which mpicc` \
-DCMAKE_CXX_COMPILER=`which mpicxx` \
-DGMX_BUILD_OWN_FFTW=on \
-DGMX_SIMD=AVX2_256 \
-DGMX_DOUBLE=off \
-DGMX_EXTERNAL_BLAS=off \
-DGMX_EXTERNAL_LAPACK=off \
-DGMX_FFT_LIBRARY=fftw3 \
-DGMX_GPU=off \
-DGMX_MPI=on \
-DGMX_OPENMP=on \
-DGMX_X11=off \
..
make (or make -j ##)
make install
You probably need to adjust
1. The CMAKE_INSTALL_PREFIX to point to a different path
2. GMX_SIMD : You may completely ommit this if your compile and compute nodes are of the same architecture (for example Haswell).
If they are different you should specify what fits to your compute nodes.
For a complete and up to date list of possible choices refer to the gromacs official build instructions.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gromacs/GROMACS_Download_README.txt 0000664 0000000 0000000 00000000254 13303521705 0026274 0 ustar 00root root 0000000 0000000 Gromacs can be downloaded from : http://www.gromacs.org/Downloads
The UEABS benchmark cases require the use of 4.6 or newer branch,
the latest 4.6.x version is suggested.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/gromacs/GROMACS_Run_README.txt 0000664 0000000 0000000 00000004445 13303521705 0025277 0 ustar 00root root 0000000 0000000 There are two data sets in UEABS for Gromacs.
1. ion_channel that use PME for electrostatics, for Tier-1 systems
2. lignocellulose-rf that use Reaction field for electrostatics, for Tier-0 systems. Reference : http://pubs.acs.org/doi/abs/10.1021/bm400442n
The input data file for each benchmark is the corresponding .tpr file produced using
tools from a complete gromacs installation and a series of ascii data files
(atom coords/velocities, forcefield, run control).
If it happens to run the tier-0 case on BG/Q use lignucellulose-rf.BGQ.tpr
instead lignocellulose-rf.tpr. It is the same as lignocellulose-rf.tpr
created on a BG/Q system.
The general way to run gromacs benchmarks is :
WRAPPER WRAPPER_OPTIONS PATH_TO_GMX mdrun -s CASENAME.tpr -maxh 0.50 -resethway -noconfout -nsteps 10000 -g logile
CASENAME is one of ion_channel or lignocellulose-rf
maxh : Terminate after 0.99 times this time (hours) i.e. gracefully terminate after ~30 min
resethwat : Reset Timer counters at half steps. This means that the reported
walltime and performance referes to the last
half steps of sumulation.
noconfout : Do not save output coordinates/velocities at the end.
nsteps : Run this number of steps, no matter what is requested in the input file
logfile : The output filename. If extension .log is ommited
it is automatically appended. Obviously, it should be different
for different runs.
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The best performance is usually obtained using pure MPI i.e. THREADSPERTASK=1.
You can check other hybrid MPI/OMP combinations.
The execution time is reported at the end of logfile : grep Time: logfile | awk -F ' ' '{print $3}'
NOTE : This is the wall time for the last half number of steps.
For sufficiently large nsteps, this is half of the total wall time.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/namd/ 0000775 0000000 0000000 00000000000 13303521705 0021057 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/namd/NAMD_Build_README.txt 0000664 0000000 0000000 00000007603 13303521705 0024501 0 ustar 00root root 0000000 0000000 Build instructions for namd.
In order to run benchmarks the memopt build with SMP support is mandatory.
NAMD may be compiled in an experimental memory-optimized mode that utilizes a compressed version of the molecular structure and also supports parallel I/O.
In addition to reducing per-node memory requirements, the compressed structure greatly reduces startup times compared to reading a psf file.
In order to build this version, your MPI need to have level of thread support: MPI_THREAD_FUNNELED
You need a NAMD 2.11 version or newer.
1. Uncompress/tar the source.
2. cd NAMD_Source_BASE (the directory name depends on how the source obtained,
typically : namd2 or NAMD_2.11_Source )
3. untar the charm-VERSION.tar that exists. If you obtained the namd source via
cvs, you need to download separately charm.
4. cd to charm-VERSION directory
5. configure and compile charm :
This step is system dependent. Some examples are :
CRAY XE6 : ./build charm++ mpi-crayxe smp --with-production -O -DCMK_OPTIMIZE
CURIE : ./build charm++ mpi-linux-x86_64 smp mpicxx ifort --with-production -O -DCMK_OPTIMIZE
JUQUEEN : ./build charm++ mpi-bluegeneq smp xlc --with-production -O -DCMK_OPTIMIZE
Help : ./build --help to see all available options.
For special notes on various systems, you should look in http://www.ks.uiuc.edu/Research/namd/2.11/notes.html.
The syntax is : ./build charm++ ARCHITECTURE smp (compilers, optional) --with-production -O -DCMK_OPTIMIZE
You can find a list of supported architectures/compilers in charm-VERSION/src/arch
The smp option is mandatory to build the Hybrid version of namd.
This builds charm++.
6. cd ..
7. Configure NAMD.
This step is system dependent. Some examples are :
CRAY-XE6 ./config CRAY-XT-g++ --charm-base ./charm-6.7.0 --charm-arch mpi-crayxe-smp --with-fftw3 --fftw-prefix $CRAY_FFTW_DIR --without-tcl --with-memopt --charm-opts -verbose
CURIE ./config Linux-x86_64-icc --charm-base ./charm-6.7.0 --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx --with-fftw3 --fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --with-memopt --charm-opts -verbose --cxx-opts "-O3 -xAVX " --cc-opts "-O3 -xAVX" --cxx icpc --cc icc --cxx-noalias-opts "-fno-alias -ip -fno-rtti -no-vec "
Juqueen: ./config BlueGeneQ-MPI-xlC --charm-base ./charm-6.7.0 --charm-arch mpi-bluegeneq-smp-xlc --with-fftw3 --with-fftw-prefix PATH_TO_FFTW3_INSTALLATION --without-tcl --charm-opts -verbose --with-memopt
Help : ./config --help to see all available options.
See in http://www.ks.uiuc.edu/Research/namd/2.11/notes.html for special notes on various systems.
What is absolutely necessary is the option : --with-memopt and an SMP enabled charm++ build.
It is suggested to disable tcl support as it is indicated by the --without-tcl flags, since tcl is not necessary
to run the benchmarks.
You need to specify the fftw3 installation directory. On systems that
use environment modules you need to load the existing fftw3 module
and probably use the provided environment variables - like in CRAY-XE6
example above.
If fftw3 libraries are not installed on your system,
download and install fftw-3.3.5.tar.gz from http://www.fftw.org/.
You may adjust the compilers and compiler flags as the CURIE example.
A typical use of compilers/flags adjustement is for example
to add -xAVX in the CURIE case and keep all the other compiler flags of the architecture the same.
Take care or even just avoid using the --cxx option for NAMD config with no reason,
as this will override the compilation flags from the arch file.
When config ends prompts to change to a directory and run make.
8. cd to the reported directory and run make
If everything is ok you'll find the executable with name namd2 in this
directory.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/namd/NAMD_Download_README.txt 0000664 0000000 0000000 00000001354 13303521705 0025206 0 ustar 00root root 0000000 0000000 The official site to download namd is :
http://www.ks.uiuc.edu/Research/namd/
You need to register for free here to get a namd copy from here :
http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
In order to get a specific CVS snapshot, you need first to ask for
username/password : http://www.ks.uiuc.edu/Research/namd/cvsrequest.html
When your cvs access application is approved, you can use your username/password
to download a specific cvs snapshot :
cvs -d :pserver:username@cvs.ks.uiuc.edu:/namd/cvsroot co -D "2013-02-06 23:59:00 GMT" namd2
In this case, the charm++ is not included.
You have to download separately and put it in the namd2 source tree :
http://charm.cs.illinois.edu/distrib/charm-6.5.0.tar.gz
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/namd/NAMD_Run_README.txt 0000664 0000000 0000000 00000002563 13303521705 0024206 0 ustar 00root root 0000000 0000000 Run instructions for NAMD.
ntell@grnet.gr
After build of NAMD you have an executable called namd2.
The best performance and scaling of namd is achieved using
hybrid MPI/MT version. On a system with nodes of NC cores per node
use 1 MPI task per node and NC threads per task,
for example on a 20 cores/node system use 1 MPI process,
set OMP_NUM_THREADS or any batch system related variable to 20.
Set a variable, for example MYPPN to NC-1,
for example to 19 for a 20 cores/node system.
You can also try other combinations of TASKSPERNODE/THREADSPERTASK to check.
The control file is stmv.8M.memopt.namd for tier-1 and stmv.28M.memopt.namd
for tier-0 systems.
The general way to run is :
WRAPPER WRAPPER_OPTIONS PATH_TO_namd2 +ppn $MYPPN stmv.8M.memopt.namd > logfile
WRAPPER and WRAPPER_OPTIONS depend on system, batch system etc.
Few common pairs are :
CRAY : aprun -n TASKS -N NODES -d THREADSPERTASK
Curie : ccc_mrun with no options - obtained from batch system
Juqueen : runjob --np TASKS --ranks-per-node TASKSPERNOD --exp-env OMP_NUM_THREADS
Slurm : srun with no options, obtained from slurm if the variables below are set.
#SBATCH --nodes=NODES
#SBATCH --ntasks-per-node=TASKSPERNODE
#SBATCH --cpus-per-task=THREADSPERTASK
The run walltime is reported at the end of logfile : grep WallClock: logfile | awk -F ' ' '{print $2}'
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/nemo/ 0000775 0000000 0000000 00000000000 13303521705 0021076 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/nemo/README.md 0000664 0000000 0000000 00000005740 13303521705 0022363 0 ustar 00root root 0000000 0000000 NEMO 3.6, GYRE configuration
============================
juha.lento@csc.fi, 2016-05-16
Build and test documentation for NEMO 3.6 in GYRE
configuration. Example commands are tested in CSC's Cray XC40,
`sisu.csc.fi`.
Download NEMO and XIOS sources
------------------------------
### Register
http://www.nemo-ocean.eu
### Check out NEMO sources
```
svn --username USERNAME --password PASSWORD --no-auth-cache co http://forge.ipsl.jussieu.fr/nemo/svn/branches/2015/nemo_v3_6_STABLE/NEMOGCM
...
Checked out revision 6542.
```
### Check out XIOS2 sources
http://www.nemo-ocean.eu/Using-NEMO/User-Guides/Basics/XIOS-IO-server-installation-and-use
```
svn co -r819 http://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/trunk xios-2.0
```
Build XIOS
----------
### Build environment
Xios requires Netcdf4.
```
module load cray-hdf5-parallel cray-netcdf-hdf5parallel
```
### Build command
http://forge.ipsl.jussieu.fr/ioserver/wiki/documentation
```
cd xios-2.0
./make_xios --job 8 --arch XC30_Cray
```
...need to be rerun without `--job 8` and test suite is broken, but library got built?
Build NEMO 3.6 in GYRE configuration
------------------------------------
### Get a bash helper for editing configuration files
```
source <(curl -s https://raw.githubusercontent.com/jlento/nemo/master/fixfcm.bash)
```
...or if you have a buggy bash 3.2...
```
wget https://raw.githubusercontent.com/jlento/nemo/master/fixfcm.bash; source fixfcm.bash
```
### Edit (create) configuration files
```
cd ../NEMOGCM/CONFIG
fixfcm < ../ARCH/arch-XC40_METO.fcm > ../ARCH/arch-MY_CONFIG.fcm \
NCDF_HOME="$NETCDF_DIR" \
HDF5_HOME="$HDF5_DIR" \
XIOS_HOME="$(readlink -f ../../xios-2.0)"
```
### Build
```
./makenemo -m MY_CONFIG -r GYRE_XIOS -n MY_GYRE add_key "key_nosignedzero"
```
Run first GYRE test
-------------------
### Preapare input files
```
cd MY_GYRE/EXP00
sed -i '/using_server/s/false/true/' iodef.xml
sed -i '/&nameos/a ln_useCT = .false.' namelist_cfg
sed -i '/&namctl/a nn_bench = 1' namelist_cfg
```
### Run the experiment interactively
```
aprun -n 4 ../BLD/bin/nemo.exe : -n 2 ../../../../xios-2.0/bin/xios_server.exe
```
GYRE configuration with higher resolution
-----------------------------------------
### Modify configuration
Parameter `jp_cfg` controls the resolution.
```
rm -f time.step solver.stat output.namelist.dyn ocean.output slurm-* GYRE_* mesh_mask_00*
jp_cfg=4
sed -i -r \
-e 's/^( *nn_itend *=).*/\1 21600/' \
-e 's/^( *nn_stock *=).*/\1 21600/' \
-e 's/^( *nn_write *=).*/\1 1000/' \
-e 's/^( *jp_cfg *=).*/\1 '"$jp_cfg"'/' \
-e 's/^( *jpidta *=).*/\1 '"$(( 30 * jp_cfg +2))"'/' \
-e 's/^( *jpjdta *=).*/\1 '"$(( 20 * jp_cfg +2))"'/' \
-e 's/^( *jpiglo *=).*/\1 '"$(( 30 * jp_cfg +2))"'/' \
-e 's/^( *jpjglo *=).*/\1 '"$(( 20 * jp_cfg +2))"'/' \
namelist_cfg
```
### Run the experiment as a SLURM batch job
```
sbatch -N 3 -p test -t 30 << EOF
#!/bin/bash
aprun -n 48 ../BLD/bin/nemo.exe : -n 8 ../../../../xios-2.0/bin/xios_server.exe
EOF
```
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/nemo/fixfcm.bash 0000664 0000000 0000000 00000000610 13303521705 0023206 0 ustar 00root root 0000000 0000000 #!/usr/bin/env bash
# A tool to modify XML files used by FCM
# This is just a regexp search and replace, not a proper XML
# parser. Use at own risk.
fixfcm() {
local name value prog=""
for arg in "$@"; do
name="${arg%%=*}"
value=$(printf %q "${arg#*=}")
value="${value//\//\/}"
prog="s/(^%${name} )(.*)/\\1 ${value}/"$'\n'"$prog"
done
sed -r -e "$prog"
}
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/qcd/ 0000775 0000000 0000000 00000000000 13303521705 0020707 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/qcd/QCD_Build_README.txt 0000664 0000000 0000000 00000011040 13303521705 0024207 0 ustar 00root root 0000000 0000000 Description and Building of the QCD Benchmark
=============================================
Description
===========
The QCD benchmark is, unlike the other benchmarks in the PRACE
application benchmark suite, not a full application but a set of 5
kernels which are representative of some of the most compute-intensive
parts of QCD calculations.
Test Cases
----------
Each of the 5 kernels has one test case to be used for Tier-0 and
Tier-1:
Kernel A is derived from BQCD (Berlin Quantum ChromoDynamics program),
a hybrid Monte-Carlo code that simulates Quantum Chromodynamics with
dynamical standard Wilson fermions. The computations take place on a
four-dimensional regular grid with periodic boundary conditions. The
kernel is a standard conjugate gradient solver with even/odd
pre-conditioning. Lattice size is 322 x 642.
Kernel B is derived from SU3_AHiggs, a lattice quantum chromodynamics
(QCD) code intended for computing the conditions of the Early
Universe. Instead of "full QCD", the code applies an effective field
theory, which is valid at high temperatures. In the effective theory,
the lattice is 3D. Lattice size is 2563.
Kernel C Lattice size is 84. Note that Kernel C can only be run in a
weak scaling mode, where each CPU stores the same local lattice size,
regardless of the number of CPUs. Ideal scaling for this kernel
therefore corresponds to constant execution time, and performance per
peak TFlop/s is simply the reciprocal of the execution time.
Kernel D consists of the core matrix-vector multiplication routine for
standard Wilson fermions. The lattice size is 644 .
Kernel E consists of a full conjugate gradient solution using Wilson
fermions. Lattice size is 643 x 3.
Building the QCD Benchmark in the JuBE Framework
================================================
The QCD benchmark is integrated in the JuBE Benchmarking Environment
(www.fz-juelich.de/jsc/jube).
JuBE also includes all steps to build the application.
Unpack the QCD_Source_TestCaseA.tar.gz into a directory of your
choice.
After unpacking the Benchmark the following directory structure is available:
PABS/
applications/
bench/
doc/
platform/
skel/
LICENCE
The applications/ subdirectory contains the QCD benchmark
applications.
The bench/ subdirectory contains the benchmark environment scripts.
The doc/ subdirectory contains the overall documentation of the
framework and a tutorial.
The platform/ subdirectory holds the platform definitions as well as
job submission script templates for each defined platform.
The skel/ subdirectory contains templates for analysis patterns for
text output of different measurement tools.
Configuration
-------------
Definition files are already prepared for many platforms. If you are
running on a defined platform just skip this part and go forward to
QCD_Run_README.txt ("Execution").
The platform
------------
A platform is defined through a set of variables in the platform.xml
file, which can be found in the platform/ directory. To create a new
platform entry, copy an existing platform description and modify it to
fit your local setup. The variables defined here will be used by the
individual applications in the later process. Best practice for the
platform nomenclature would be: --. Additionally, you have to create a template batch
submission script, which should be placed in a subdirectory of the
platform/ directory of the same name as the platform itself. Although
this nomenclature is not required by the benchmarking environment, it
helps keeping track of you templates, and minimises the amount of
adaptation necessary for the individual application configurations.
The applications
----------------
Once a platform is defined, each individual application that should be
used in the benchmark (in this case the QCD application) needs to be
configured for this platform. In order to configure an individual
application, copy an existing top-level configuration file
(e.g. prace-scaling-juqueen.xml) to the file prace-.xml.
Then open an editor of your choice, to adapt the file to your
needs. Change the settings of the platform parameter to the name of
your defined platform. The platform name can then be referenced
throughout the benchmarking environment by the $platform variable.
Do the same for compile.xml, execute.xml, analyse.xml.
You can find a step by step tutorial also in doc/JuBETutorial.pdf.
The compilation is part of the run of the application. Please continue
with the QCD_Run_README.txt to finalize the build and to run the
benchmark.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/qcd/QCD_Run_README.txt 0000664 0000000 0000000 00000005061 13303521705 0023722 0 ustar 00root root 0000000 0000000 Running the QCD Benchmarks in the JuBE Framework
================================================
Unpack the QCD_Source_TestCaseA.tar.gz into a directory of your
choice.
After unpacking the Benchmark the following directory structure is available:
PABS/
applications/
bench/
doc/
platform/
skel/
LICENCE
The applications/ subdirectory contains the QCD benchmark
applications.
The bench/ subdirectory contains the benchmark environment scripts.
The doc/ subdirectory contains the overall documentation of the
framework and a tutorial.
The platform/ subdirectory holds the platform definitions as well as
job submission script templates for each defined platform.
The skel/ subdirectory contains templates for analysis patterns for
text output of different measurement tools.
Configuration
=============
Definition files are already prepared for many platforms. If you are
running on a defined platform just go forward, otherwise please have a
look at QCD_Build_README.txt.
Execution
=========
Assuming the Benchmark Suite is installed in a directory that can be
used during execution, a typical run of a benchmark application will
contain two steps.
1. Compiling and submitting the benchmark to the system scheduler.
2. Verifying, analysing and reporting the performance data.
Compiling and submitting
------------------------
If configured correctly, the application benchmark can be compiled and
submitted on the system (e.g. the IBM BlueGene/Q at Jülich) with
the commands:
>> cd PABS/applications/QCD
>> perl ../../bench/jube prace-scaling-juqueen.xml
The benchmarking environment will then compile the binary for all
node/task/thread combinations defined, if those parameters need to be
compiled into the binary. It creates a so-called sandbox subdirectory
for each job, ensuring conflict free operation of the individual
applications at runtime. If any input files are needed, those are
prepared automatically as defined.
Each active benchmark in the application’s top-level configuration
file will receive an ID, which is used as a reference by JUBE later
on.
Verifying, analysing and reporting
----------------------------------
After the benchmark jobs have run, an additional call to jube will
gather the performance data. For this, the options -update and -result
are used.
>> cd DEISA_BENCH/application/QCD
>> perl ../../bench/jube -update -result
The ID is the reference number the benchmarking environment has
assigned to this run. The performance data will then be output to
stdout, and can be post-processed from there.
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/quantum_espresso/ 0000775 0000000 0000000 00000000000 13303521705 0023555 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/quantum_espresso/QuantumEspresso_Download.txt 0000664 0000000 0000000 00000000205 13303521705 0031320 0 ustar 00root root 0000000 0000000 The Quantum Espresso package can be freely downloaded from the following URL:
http://www.quantum-espresso.org/download/
13/08/2013
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/quantum_espresso/QuantumEspresso_Run_README.txt 0000664 0000000 0000000 00000002022 13303521705 0031331 0 ustar 00root root 0000000 0000000 Running The Quantum Espresso Test Cases
---------------------------------------
1. Unpack the tar file containing the input files (command file .in and
pseudopotentials .UPF) in the directory where you want to run the program. For example,
tar zxvf QuantumEspresso_TestCaseA.tar.gz
2. Find the command file cp.in (test cae A) or pw.in (test case B) and check
that the variable pseudo_dir is set to the location of the UPF files, for
example the current directory
pseudo_dir = './'
3. Create a batch file and include in the file the command to launch MPI jobs
(this is system dependent). The benchmark data have been collected varying the
number of MPI tasks only so if the Quantum Espresso version has been compiled
with OpenMP support you should set the number of OpenMP threads to 1. Most
batch scripts will thus contain lines such as:
export OMP_NUM_THREADS=1
mpirun path-to-exectuable/pw.x < cp.in
but check your local documentation.
4. The output including timing information will be sent to standard output.
Cineca 13/08/2013
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/specfem3d/ 0000775 0000000 0000000 00000000000 13303521705 0022011 5 ustar 00root root 0000000 0000000 ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/specfem3d/SPECFEM3D_Build_README.txt 0000664 0000000 0000000 00000000773 13303521705 0026166 0 ustar 00root root 0000000 0000000 on CURIE :
flags.guess :
DEF_FFLAGS="-O3 -DFORCE_VECTORIZATION -check nobounds -xHost -ftz
-assume buffered_io -assume byterecl -align sequence -vec-report0 -std03
-diag-disable 6477 -implicitnone -warn truncated_source -warn
argument_checking -warn unused -warn declarations -warn alignments -warn
ignore_loc -warn usage -mcmodel=medium -shared-intel"
configure command :
./configure MPIFC=mpif90 FC=ifort CC=icc CFLAGS="-mcmodel=medium
-shared-intel" CPP=cpp
in order to compile :
make clean
make all
ueabs-d5c40bfe25a41e002e45aa6aacde539e57adec21/specfem3d/SPECFEM3D_Run_README.txt 0000664 0000000 0000000 00000000574 13303521705 0025672 0 ustar 00root root 0000000 0000000 To run the test cases, copy the Par_file, STATIONS and CMTSOLUTION files into the SPECFEM3D_GLOBE/DATA directory.
Recompile the mesher and the solver.
Run the mesher and the solver.
On Curie the commands to put in the submission file are :
ccc_mprun bin/xmeshfem3D
ccc_mprun bin/xspecfem3D
SPECFEM3D_TestCaseA runs on 864 cores and SPECFEM3D_TestCaseB runs on 11616 cores.