diff --git a/README.md b/README.md index 2e7abbc903a69ae02987a9448faee06ce51760c4..4f5f572d8936d463edfd4936702cd70808eef45f 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,12 @@ The UEABS has been and will be actively updated and maintained by the subsequent Each application code has either one, or two input datasets. If there are two datasets, Test Case A is designed to run on Tier-1 sized systems (up to around 1,000 x86 cores, or equivalent) and Test Case B is designed to run on Tier-0 sized systems (up to around 10,000 x86 cores, or equivalent). If there is only one dataset (Test Case A), it is suitable for both sizes of system. -Contacts: Valeriu Codreanu or Walter Lioen +Contacts: (Ok to mention all BCOs here?, ask PMO for a UEABS contact mailing list address?), Walter Lioen Current Release --------------- -The current release is Version 2.1 (April 30, 2019). +The current release is Version 2.2 (December 31, 2021). See also the [release notes and history](RELEASES.md). Running the suite @@ -21,10 +21,196 @@ Running the suite Instructions to run each test cases of each codes can be found in the subdirectories of this repository. -For more details of the codes and datasets, and sample results, please see the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019) at http://www.prace-ri.eu/public-deliverables/ . +For more details of the codes and datasets, and sample results, please see the PRACE-6IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021) at http://www.prace-ri.eu/public-deliverables/ . The application codes that constitute the UEABS are: --------------------------------------------------- + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ApplicationLines of
Code
ParallelismLanguageCode Description/Notes
MPIOpenMP/
Pthreads
GPUFortranPythonCC++
Alya + + 600,000XXXXThe Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain).
Code_Saturne~350,000XXXXXXXThe code solves the Navier-Stokes equations for imcompressible/compressible flows using a predictor-corrector technique. The Poisson pressure equation is solved by a Conjugate Gradient preconditioned by a multi-grid algorithm, and the transport equations by Conjugate Gradient-like methods. Advanced gradient reconstruction is also available to account for distorted meshes.
CP2K~1,150,000XXXXCP2K is a freely available quantum chemistry and solid-state physics software package for performing atomistic simulations. It can be run with MPI, OpenMP and CUDA. All of CP2K is MPI parallelised, with some routines making use of OpenMP, which can be used to reduce the memory footprint. In addition some linear algebra operations may be offloaded to GPUs using CUDA.
GADGET
GPAW132,000XXXX
GROMACS3,227,337XXXXX
NAMD1,992,651XXXX
NEMO154,240XXXNEMO (Nucleus for European Modelling of the Ocean) is a mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by a European consortium. It is intended to be a tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process).
PFARM21,434XXXXPFARM uses an R-matrix ab-initio approach to calculate electron-atom and electron-molecule collisions data for a wide range of applications including atrophysics and nuclear fusion. It is written in modern Fortran/MPI/OpenMP and exploits highly-optimised dense linear algebra numerical library routines.
QCD
Quantum ESPRESSO92,996XXXX
SPECFEM3D
TensorFlow
- [ALYA](#alya) - [Code_Saturne](#saturne) @@ -46,11 +232,10 @@ The application codes that constitute the UEABS are: The Alya System is a Computational Mechanics code capable of solving different physics, each one with its own modelization characteristics, in a coupled way. Among the problems it solves are: convection-diffusion reactions, incompressible flows, compressible flows, turbulence, bi-phasic flows and free surface, excitable media, acoustics, thermal flow, quantum mechanics (DFT) and solid mechanics (large strain). ALYA is written in Fortran 90/95 and parallelized using MPI and OpenMP. - Web site: https://www.bsc.es/computer-applications/alya-system -- Code download: https://repository.prace-ri.eu/ueabs/ALYA/2.1/Alya.tar.gz -- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1/alya/ALYA_Build_README.txt -- Test Case A: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseA.tar.gz -- Test Case B: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseB.tar.gz -- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1/alya/ALYA_Run_README.txt +- Code download: https://gitlab.com/bsc-alya/open-alya +- Build and run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/alya/README.md +- Test Case A: https://gitlab.com/bsc-alya/benchmarks/sphere-16M +- Test Case B: https://gitlab.com/bsc-alya/benchmarks/sphere-132M # Code_Saturne @@ -83,14 +268,20 @@ CP2K is written in Fortran 2008 and can be run in parallel using a combination o # GADGET -GADGET is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany. GADGET is written in C and uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs. GADGET computes gravitational forces with a hierarchical tree algorithm (optionally in combination with a particle-mesh scheme for long-range gravitational forces) and represents fluids by means of smoothed particle hydrodynamics (SPH). The code can be used for studies of isolated systems, or for simulations that include the cosmological expansion of space, either with, or without, periodic boundary conditions. In all these types of simulations, GADGET follows the evolution of a self-gravitating collisionless N-body system, and allows gas dynamics to be optionally included. Both the force computation and the time stepping of GADGET are fully adaptive, with a dynamic range that is, in principle, unlimited. GADGET can therefore be used to address a wide array of astrophysics interesting problems, ranging from colliding and merging galaxies, to the formation of large-scale structure in the Universe. With the inclusion of additional physical processes such as radiative cooling and heating, GADGET can also be used to study the dynamics of the gaseous intergalactic medium, or to address star formation and its regulation by feedback processes. +GADGET-4 (GAlaxies with Dark matter and Gas intEracT), an evolved and improved version of GADGET-3, is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory written mainly by Volker Springel, Max-Plank-Institute for Astrophysics, Garching, Germany, nd benefiting from numerous contributions, including Ruediger Pakmor, Oliver Zier, and Martin Reinecke. GADGET-4 supports collisionless simulations and smoothed particle hydrodynamics on massively parallel computers. All communication between concurrent execution processes is done either explicitly by means of the message passing interface (MPI), or implicitly through shared-memory accesses on processes on multi-core nodes. The code is mostly written in ISO C++ (assuming the C++11 standard), and should run on all parallel platforms that support at least MPI-3. So far, the compatibility of the code with current Linux/UNIX-based platforms has been confirmed on a large number of systems. + +The code can be used for plain Newtonian dynamics, or for cosmological integrations in arbitrary cosmologies, both with or without periodic boundary conditions. Stretched periodic boxes, and special cases such as simulations with two periodic dimensions and one non-periodic dimension are supported as well. The modeling of hydrodynamics is optional. The code is adaptive both in space and in time, and its Lagrangian character makes it particularly suitable for simulations of cosmic structure formation. Several post-processing options such as group- and substructure finding, or power spectrum estimation are built in and can be carried out on the fly or applied to existing snapshots. Through a built-in cosmological initial conditions generator, it is also particularly easy to carry out cosmological simulations. In addition, merger trees can be determined directly by the code. + +- Web site: https://wwwmpa.mpa-garching.mpg.de/gadget4 +- Code download: https://gitlab.mpcdf.mpg.de/vrs/gadget4 +- Build and run instructions: https://wwwmpa.mpa-garching.mpg.de/gadget4/02_running.html +- Benchmarks: + - [Case A: Colliding galaxies with star formation](./gadget/4.0/gadget4-case-A.tar.gz) + - [Case B: Cosmological DM-only simulation with IC creation](./gadget/4.0/gadget4-case-B.tar.gz) + - [Case C: Adiabatic collapse of a gas sphere](./gadget/4.0/gadget4-case-C.tar.gz) +- [Code used in the benchmarks](./gadget/4.0/gadget4.tar.gz) +- [Build & run instructions, details about the benchmarks](./gadget/4.0/README.md) -- Web site: http://www.mpa-garching.mpg.de/gadget/ -- Code download: https://repository.prace-ri.eu/ueabs/GADGET/gadget3_Source.tar.gz -- Disclaimer: please note that by downloading the code from this website, you agree to be bound by the terms of the GPL license. -- Build instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/gadget/gadget3_Build_README.txt -- Test Case A: https://repository.prace-ri.eu/ueabs/GADGET/gadget3_TestCaseA.tar.gz -- Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r1.3/gadget/gadget3_Run_README.txt # GPAW @@ -103,14 +294,15 @@ The equations of the (time-dependent) density functional theory within the PAW m The program offers several parallelization levels. The most basic parallelization strategy is domain decomposition over the real-space grid. In magnetic systems it is possible to parallelize over spin, and in systems that have k-points (surfaces or bulk systems) parallelization over k-points is also possible. Furthermore, parallelization over electronic states is possible in DFT and in real-time TD-DFT calculations. GPAW is written in Python and C and parallelized with MPI. - Web site: https://wiki.fysik.dtu.dk/gpaw/ -- Code download: https://gitlab.com/gpaw/gpaw -- Build instructions: [gpaw/README.md#install](gpaw/README.md#install) +- Code download: [gpaw GitLab repository](https://gitlab.com/gpaw/gpaw) or [gpaw on + PyPi](https://pypi.org/project/gpaw/) +- Build instructions: [gpaw README, section "Mechanics of building the benchmark"](gpaw/README.md#mechanics-of-building-the-benchmark) - Benchmarks: - - [Case S: Carbon nanotube](gpaw/benchmark/carbon-nanotube) - - [Case M: Copper filament](gpaw/benchmark/copper-filament) - - [Case L: Silicon cluster](gpaw/benchmark/silicon-cluster) + - [Case S: Carbon nanotube](gpaw/benchmark/1_S_carbon-nanotube/input.py) + - [Case M: Copper filament](gpaw/benchmark/2_M_copper-filament/input.py) + - [Case L: Silicon cluster](gpaw/benchmark/3_L_silicon-cluster/input.py) - Run instructions: - [gpaw/README.md#running-the-benchmarks](gpaw/README.md#running-the-benchmarks) + [gpaw README, section "Mechanics of running the benchmark"](gpaw/README.md#mechanics-of-running-the-benchmark) # GROMACS @@ -167,21 +359,21 @@ NAMD is written in C++ and parallelised using Charm++ parallel objects, which ar # NEMO NEMO (Nucleus for European Modelling of the Ocean) [22] is mathematical modelling framework for research activities and prediction services in ocean and climate sciences developed by European consortium. It is intended to be tool for studying the ocean and its interaction with the other components of the earth climate system over a large number of space and time scales. It comprises of the core engines namely OPA (ocean dynamics and thermodynamics), SI3 (sea ice dynamics and thermodynamics), TOP (oceanic tracers) and PISCES (biogeochemical process). -Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity. +Prognostic variables in NEMO are the three-dimensional velocity field, a linear or non-linear sea surface height, the temperature and the salinity. In the horizontal direction, the model uses a curvilinear orthogonal grid and in the vertical direction, a full or partial step z-coordinate, or s-coordinate, or a mixture of the two. The distribution of variables is a three-dimensional Arakawa C-type grid for most of the cases. The model is implemented in Fortran 90, with preprocessing (C-pre-processor). It is optimized for vector computers and parallelized by domain decomposition with MPI. It supports modern C/C++ and Fortran compilers. All input and output is done with third party software called XIOS with dependency on NetCDF (Network Common Data Format) and HDF5. It is highly scalable and perfect application for measuring supercomputing performances in terms of compute capacity, memory subsystem, I/O and interconnect performance. -### Test Case Description +### Test Case Description The GYRE configuration has been built to model seasonal cycle of double gyre box model. It consists of idealized domain over which seasonal forcing is applied. This allows for studying large number of interactions and their combined contribution to large scale circulation. -The domain geometry is rectangular bounded by vertical walls and flat bottom. The configuration is meant to represent idealized north Atlantic or north pacific basin. The circulation is forced by analytical profiles of wind and buoyancy fluxes. +The domain geometry is rectangular bounded by vertical walls and flat bottom. The configuration is meant to represent idealized north Atlantic or north pacific basin. The circulation is forced by analytical profiles of wind and buoyancy fluxes. The wind stress is zonal and its curl changes sign at 22 and 36. It forces a subpolar gyre in the north, a subtropical gyre in the wider part of the domain and a small recirculation gyre in the southern corner. The net heat flux takes the form of a restoring toward a zonal apparent air temperature profile. A portion of the net heat flux which comes from the solar radiation is allowed to penetrate within the water column. The fresh water flux is also prescribed and varies zonally. It is determined such as, at each time step, the basin-integrated flux is zero. -The basin is initialized at rest with vertical profiles of temperature and salinity uniformity applied to the whole domain. The GYRE configuration is set through the namelist_cfg file. +The basin is initialized at rest with vertical profiles of temperature and salinity uniformity applied to the whole domain. The GYRE configuration is set through the namelist_cfg file. The horizontal resolution is determined by setting jp_cfg as follows: @@ -203,7 +395,7 @@ In this configuration, we use default value of 30 ocean levels depicted by jpk=3 **Test Case B** * jp_cfg = 256 suitable up to 20,000 cores. -* Number of Days (real): 80 +* Number of Days (real): 80 * Number of time step: 4320 * Time step size(real): 20 mins * Number of seconds per time step: 1200 @@ -214,19 +406,19 @@ In this configuration, we use default value of 30 ocean levels depicted by jpk=3 # PFARM -PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the variational solution of the many-electron Schrödinger -equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical -applications (such as: the interstellar medium, planetary atmospheres) with, for example, various ions of Fe and Ni and neutral O, plus -other applications such as data for plasma modelling and fusion reactor impurities. The code has recently been adapted to form a compatible -interface with the UKRmol suite of codes for electron (positron) molecule collisions thus enabling large-scale parallel ‘outer-region’ -calculations for molecular systems as well as atomic systems. +PFARM is part of a suite of programs based on the ‘R-matrix’ ab-initio approach to the variational solution of the many-electron Schrödinger +equation for electron-atom and electron-ion scattering. The package has been used to calculate electron collision data for astrophysical +applications (such as: the interstellar medium, planetary atmospheres) with, for example, various ions of Fe and Ni and neutral O, plus +other applications such as data for plasma modelling and fusion reactor impurities. The code has recently been adapted to form a compatible +interface with the UKRmol suite of codes for electron (positron) molecule collisions thus enabling large-scale parallel ‘outer-region’ +calculations for molecular systems as well as atomic systems. -The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. -The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take -advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via -shared memory enabled numerical library kernels. +The PFARM outer-region application code EXDIG is domi-nated by the assembly of sector Hamiltonian matrices and their subsequent eigensolutions. +The code is written in Fortran 2003 (or Fortran 2003-compliant Fortran 95), is parallelised using MPI and OpenMP and is designed to take +advantage of highly optimised, numerical library routines. Hybrid MPI / OpenMP parallelisation has also been introduced into the code via +shared memory enabled numerical library kernels. -Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time. +Accelerator-based implementations have been implemented for EXDIG, using off-loading (MKL or CuBLAS/CuSolver) for the standard (dense) eigensolver calculations that dominate overall run-time. Code download: https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/pfarm - Build & Run instructions: https://repository.prace-ri.eu/git/UEABS/ueabs/-/tree/r2.2-dev/pfarm/PFARM_Build_Run_README.txt @@ -262,7 +454,7 @@ QUANTUM ESPRESSO is written mostly in Fortran90, and parallelised using MPI and The Scalable HeterOgeneous Computing (SHOC) benchmark suite is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures for general purpose computing. It serves as synthetic benchmark suite in the UEABS context. Its initial focus is on systems containing Graphics Processing Units (GPUs) and multi-core processors, featuring implementations using both CUDA and OpenCL. It can be used on clusters as well as individual hosts. -Also, SHOC includes an Offload branch for the benchmarks that can be used to evaluate the Intel Xeon Phi x100 family. +Also, SHOC includes an Offload branch for the benchmarks that can be used to evaluate the Intel Xeon Phi x100 family. The SHOC benchmark suite currently contains benchmark programs, categoried based on complexity. Some measure low-level "feeds and speeds" behavior (Level 0), some measure the performance of a higher-level operation such as a Fast Fourier Transform (FFT) (Level 1), and the others measure real application kernels (Level 2). @@ -275,16 +467,16 @@ The SHOC benchmark suite currently contains benchmark programs, categoried based # SPECFEM3D | **General information** | **Scientific field** | **Language** | **MPI** | **OpenMP** | **GPU** | **LoC** | **Code description** | |------------------|----------------------|--------------|---------|------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------| -| [- Website](https://geodynamics.org/cig/software/specfem3d_globe/)
[- Source](https://github.com/geodynamics/specfem3d_globe.git)
[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/tree/r2.1-dev/specfem3d)
[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/specfem3d/PRACE_UEABS_Specfem3D_summary.pdf) | Geodynamics | Fortran | yes | yes | Yes (CUDA) | 140000 | The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). | +| [- Website](https://geodynamics.org/cig/software/specfem3d_globe/)
[- Source](https://github.com/geodynamics/specfem3d_globe.git)
[- Bench](https://repository.prace-ri.eu/git/UEABS/ueabs/tree/r2.1-dev/specfem3d)
[- Summary](https://repository.prace-ri.eu/git/UEABS/ueabs/blob/r2.1-dev/specfem3d/PRACE_UEABS_Specfem3D_summary.pdf) | Geodynamics | Fortran & C | yes | yes | Yes (CUDA) | 100k Fortran & 20k C | The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). | # TensorFlow -TensorFlow (https://www.tensorflow.org) is a popular open-source library for symbolic math and linear algebra, with particular optimization for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry. +TensorFlow (https://www.tensorflow.org) is a popular open-source library for symbolic math and linear algebra, with particular optimization for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry. TensorFlow supports a wide variety of hardware platforms (CPUs, GPUs, TPUs), and can be scaled up to utilize multiple compute devices on a single or multiple compute nodes. The main objective of this benchmark is to profile the scaling behavior of TensorFlow on different hardware, and thereby provide a reference baseline of its performance for different sizes of applications. -There are many open-source datasets available for benchmarking TensorFlow, such as `mnist`, `fashion_mnist`, `cifar`, `imagenet`, and so on. This benchmark suite, however, would like to focus on a scientific research use case. `DeepGalaxy` is a code built with TensorFlow, which uses deep neural network to classify galaxy mergers in the Universe, observed by the Hubble Space Telescope and the Sloan Digital Sky Survey. +There are many open-source datasets available for benchmarking TensorFlow, such as `mnist`, `fashion_mnist`, `cifar`, `imagenet`, and so on. This benchmark suite, however, would like to focus on a scientific research use case. `DeepGalaxy` is a code built with TensorFlow, which uses deep neural network to classify galaxy mergers in the Universe, observed by the Hubble Space Telescope and the Sloan Digital Sky Survey. - Website: https://github.com/maxwelltsai/DeepGalaxy - Code download: https://github.com/maxwelltsai/DeepGalaxy diff --git a/RELEASES.md b/RELEASES.md index 1e8fa1c10859320d003699e642dcd326870ff2bd..97378d714a856cb124ce340cd23350849ecf27c0 100644 --- a/RELEASES.md +++ b/RELEASES.md @@ -1,5 +1,15 @@ # UEABS Releases +## Version 2.2 (PRACE-6IP, December 31, 2021) +* Changed the presentation, making it similar to the CORAL Benchmarks (cf. CORAL Benchmarks and CORAL-2 Benchmarks) +* Removed the SHOC benchmark suite +* Added the TensorFlow benchmark +* Alya ... +* ... +* ... +* TensorFlow ... +* Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Benchmark Performance" (November 30, 2021) + ## Version 2.1 (PRACE-5IP, April 30, 2019) * Updated the benchmark suite to the status as used for the PRACE-5IP benchmarking deliverable D7.5 "Evaluation of Accelerated and Non-accelerated Benchmarks" (April 18, 2019) diff --git a/alya/README.md b/alya/README.md index b32a1260edbe45a48a11c416d9991650e2b5b89d..17edf4e3e48ef9b034f93729b48b545e8ec4ace6 100644 --- a/alya/README.md +++ b/alya/README.md @@ -11,45 +11,43 @@ The Alya System is a Computational Mechanics code capable of solving different p * Web site: https://www.bsc.es/computer-applications/alya-system -* Code download: https://repository.prace-ri.eu/ueabs/ALYA/2.1/Alya.tar.gz +* Code download: https://gitlab.com/bsc-alya/open-alya -* Test Case A: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseA.tar.gz +* Test Case A: https://gitlab.com/bsc-alya/benchmarks/sphere-16M -* Test Case B: https://repository.prace-ri.eu/ueabs/ALYA/2.1/TestCaseB.tar.gz +* Test Case B: https://gitlab.com/bsc-alya/benchmarks/sphere-132M ## Mechanics of Building Benchmark -Alya builds the makefile from the compilation options defined in config.in. In order to build ALYA (Alya.x), please follow these steps after unpack the tar.gz: +You can compile alya using CMake. It follows the classic CMake configuration, except for the compiler management that has been customized by the developers. + +### Creation of the build directory + +In your alya directory, create a new build directory: -Go to to directory: Executables/unix ``` -cd Executables/unix +mkdir build +cd build ``` -Edit config.in (some default config.in files can be found in directory configure.in): +### Configuration -* Select your own MPI wrappers and paths -* Select size of integers. Default is 4 bytes, For 8 bytes, select -DI8 -* Choose your metis version, metis-4.0 or metis-5.1.0_i8 for 8-bytes integers +To configure cmake using the command line, type the following: -Configure Alya: + cmake .. - ./configure -x nastin parall +If you want to customize the build options, use -DOPTION=value. For example, to enable GPU as it follows: -Compile metis: + cmake .. -DWITH_GPU=ON - make metis4 +### Compilation -or - - make metis5 - -Finally, compile Alya: - make -j 8 + make -j 8 +For more information: https://gitlab.com/bsc-alya/alya/-/wikis/Documentation/Installation ## Mechanics of Running Benchmark @@ -59,8 +57,8 @@ The parameters used in the datasets try to represent at best typical industrial The different datasets are: - SPHERE_16.7M ... 16.7M sphere mesh - SPHERE_132M .... 132M sphere mesh + Test Case A: SPHERE_16.7M ... 16.7M sphere mesh + Test Case B: SPHERE_132M .... 132M sphere mesh ### How to execute Alya with a given dataset diff --git a/alya/configure.in/config.in.amd b/alya/configure.in/config.in.amd deleted file mode 100644 index 6e4076164bd0bc69962fdbc9935d35795464c905..0000000000000000000000000000000000000000 --- a/alya/configure.in/config.in.amd +++ /dev/null @@ -1,416 +0,0 @@ -################################################################### -# IFORT CONFIGURE # -#MN4 RECOMENDED MODULE: # -#module load intel/2017.4 impi/2017.4 # -################################################################### - -#@Compiler: Using Intel Fortran Compiler ifort. -F77 = mpif90 -F90 = mpif90 -FCOCC = mpicc -c -FCFLAGS = -module $O -c -FPPFLAGS = -fpp -EXTRALIB = -EXTRAINC = -fa2p = mpif90 -module ../../Utils/user/alya2pos -c -fpp -fa2plk = mpif90 - -################################################################### -# PERFORMANCE FLAGS # -################################################################### - -#@Optimization: O3 -FOPT = -O3 - -#Compilation flags applied only to the Source/modules folder -#@Special flags only for Souce/modules folder activaded. -#MODULEFLAGS = - -#Uncomment the following line to enable NDIME as a parameter (OPTIMIZATION FOR 3D PROBLEMS) -#@Optimization for 3D PROBLEMS activated. -CSALYA := $(CSALYA) -DNDIMEPAR - -#Uncomment the following line for DEBUG AND CHECKING FLAGS -#@Debug flags activated. -#CSALYA := $(CSALYA) -ftrapuv -check all,noarg_temp_created -traceback -debug full -warn all,nodec,nointerfaces -fp-stack-check -ansi-alias - -#Vectorization: put vector size (in principle=4 for MN) -#@Vectorization activated. -CSALYA := $(CSALYA) -DVECTOR_SIZE=16 - -#@Marenostrum IV Optimizations -CSALYA := $(CSALYA) -mavx2 -axCORE-AVX2,CORE-AVX512,MIC-AVX512 - -################################################################### -# PROFILING FLAGS # -################################################################### - -# Uncomment the following line to generate profiling info files -#@Profiling info files will be generated. -#CSALYA := $(CSALYA) -profile-loops=all -profile-loops-report=2 - -################################################################### -# EXTRAE FLAGS # -################################################################### - -# Uncomment the following line to compile Alya using extrae -# 1. Define EXTRAE_HOME -# 2. Add xml file. See e.g. in /apps/BSCTOOLS/extrae/latest/impi_2017_4/share/example/*/extrae.xml -# 3. In batch file, use: srun ./trace.sh Alya.x xxx - -#@Linking with Extrae -#EXTRAE_HOME =/apps/BSCTOOLS/extrae/latest/impi_2017_4 -#EXTRALIB := $(EXTRALIB) -L${EXTRAE_HOME}/lib/ -lmpitracef -#@Enabling Extrae user calls (normal) -#CSALYA := $(CSALYA) -DEXTRAE - -################################################################### -# METIS LIBRARY # -################################################################### - -#@Using Metis 4.0 -EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-4.0 -lmetis -CSALYA := $(CSALYA) -DMETIS - -#@Using Metis 5.0.2 or metis 5.1.0 -#CSALYA := $(CSALYA) -DV5METIS -#CSALYA := $(CSALYA) -DV51METIS -#@Using Metis 5.0.2 or metis 5.1.0 for MAC -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.0.2_i8/build/Darwin-x86_64/libmetis -lmetis -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.1.2_i8/build/Darwin-x86_64/libmetis -lmetis -#@Using Metis 5.0.2 or metis 5.1.0 for Linux -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.0.2_i8/build/Linux-x86_64/libmetis -lmetis -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.1.0_i8/build/Linux-x86_64/libmetis -lmetis - -#@Using parametis. -#CSALYA := $(CSALYA) -DPARMETIS -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/ - -################################################################### -# ZOLTAN LIBRARY # -################################################################### - -#EXTRALIB := $(EXTRALIB) -L/gpfs/projects/bsc21/bsc21499/Zoltan_v3.83/Build/lib -lzoltan -#CSALYA := $(CSALYA) -I/gpfs/projects/bsc21/bsc21499/Zoltan_v3.83/Build/include -#CSALYA := $(CSALYA) -DZOLTAN - -################################################################### -# BLAS LIBRARY # -################################################################### - -# Uncomment the following lines for using blas -#@BLAS Activated -#EXTRALIB := $(EXTRALIB) -lblas -#CSALYA := $(CSALYA) -DBLAS - -################################################################### -# MPI/THREADS/TASKS # -################################################################### - -# Uncomment the following lines for sequential (NOMPI) version -#@Sequential ALYA. -#VERSIONMPI = nompi -#CSALYA := $(CSALYA) -DMPI_OFF - -# Uncomment the following lines for OPENMP version -#@OpenMP Activated. heap-arrays option may be used to stack overflow and have Alya crash -#CSALYA := $(CSALYA) -qopenmp -#CSALYA := $(CSALYA) -heap-arrays -#EXTRALIB := $(EXTRALIB) -qopenmp - -# Uncomment the following line to disable OPENMP coloring -#@OpenMP Coloring Activated. -#CSALYA := $(CSALYA) -DNO_COLORING - -# Uncomment the following line to enable OMPSS in order to activate -# loops with multidependencies whenever we have a race condition -#@OMPPS Activated. -#CSALYA := $(CSALYA) -DALYA_OMPSS - -# To activate MPI3 (non-blocking MPI_Allreduce) -#@MPI3 activated. -#CSALYA := $(CSALYA) -DMPI3 - -################################################################### -# DLB # -# To use DLB with OMPSS: Define -DALYA_DLB -DALYA_OMPSS # -# To use DLB barrier: Define -DALYA_DLB_BARRIER # -# If we compile with mercurium, there is no need to link # -# with DLB # -# # -################################################################### - -#@DLB Activated. -#CSALYA := $(CSALYA) -DALYA_DLB -#FCFLAGS := $(FCFLAGS) --dlb -#FPPFLAGS := $(FPPFLAGS) --dlb - -################################################################### -# TALP # -# In the batcj script, run the following way: # -# export DLB_ARGS=--talp # -# module load dlb/git # -# srun env LD_PRELOAD=$DLB_HOME/lib/libdlb_mpi.so ./Alya.x xxx # -# # -################################################################### - -#@DLB_TALP Activated. -#CSALYA := $(CSALYA) -DALYA_TALP -#CSALYA := $(CSALYA) -I$(DLB_HOME)/include -#EXTRALIB := $(EXTRALIB) -L$(DLB_HOME)/lib -ldlb -Wl,-rpath,$(DLB_HOME)/lib - -################################################################### -# INTEGER TYPE # -################################################################### -# Uncomment the following lines for 8 byte integers -#@Integers size: 8 -#CSALYA := $(CSALYA) -DI8 -m64 - - -################################################################### -# CANTERA # -################################################################### -# Uncomment the following lines for Cantera -#@Using Cantera -#CANTERA=1 -#@ Marenostrum 4 -#ROOT_BOOST=/gpfs/projects/bsc21/Cantera-Alya/boost_1_68_0/ -#ROOT_CANTERA=/gpfs/projects/bsc21/Cantera-Alya/cantera/INTEL/ -#@ Nord3 -#ROOT_BOOST=/apps/BOOST/1_53_0/IMPI/include/ -#ROOT_CANTERA=/gpfs/projects/bsc21/Cantera-Alya/cantera/INTEL_NORD3/ -################################################################### -# HDF5 # -# # -#MANDATORY MODULES TO LOAD ON MN3 FOR 4 BYTE INTEGERS VERSION # -#module load HDF5/1.8.14 SZIP # -# # -#MANDATORY MODULES TO LOAD ON MN3 FOR 8 BYTE INTEGERS VERSION # -#module load SZIP/2.1 HDF5/1.8.15p1_static # -# # -#MANDATORY SERVICES TO COMPILE # -#hdfpos # -#MORE INFO:bsccase02.bsc.es/alya/tutorial/hdf5_output.html # -################################################################### - -# Uncomment the following lines for using HDF5 IN 4 BYTE INTEGER VERSION -#@HDF5 4 Bytes integer Activated please load the module HDF5/1.8.14 SZIP -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.14/INTEL/IMPI/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5_fortran.a /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5hl_fortran.a /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# Uncomment the following lines for using HDF5 IN 8 BYTE INTEGER VERSION -#@HDF5 8 Bytes integer Activated please load the module SZIP/2.1 HDF5/1.8.15p1_static -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.15p1/static/include -#EXTRALIB := $(EXTRALIB) /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_fortran.a /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_f90cstub.a -#EXTRALIB := $(EXTRALIB) /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5.a /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_tools.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# latest Hdf5 (1.8.16) to be used with latest impi (5.1.2.150) and intel(16.0.1) -# there is module load HDF5/1.8.16-mpi for integers 4 but I have seen it is not necesary to call it -# Toni should clean up this - my reconmendation is to only leave this last version -# moreover there seems to be no need for module load hdf5 (in the i8 case there exists no module load) - -# Uncomment the following lines for using HDF5 IN 4 BYTE INTEGER VERSION -#@HDF5 1.8.16 - 4 Bytes integer Activated -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.16/INTEL/IMPI/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5_fortran.a /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5hl_fortran.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5.a /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# Uncomment the following lines for using HDF5 IN 8 BYTE INTEGER VERSION -#@HDF5 1.8.16 - 8 Bytes integer Activated -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.16/INTEL/IMPI/int8/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5_fortran.a /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5hl_fortran.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5.a /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -################################################################### -# VTK # -# # -#MANDATORY THIRDPARTY COMPILATION # -#Go to Alya/Thirdparties/VTK # -# # -#MORE INFO:bsccase02.bsc.es/alya/tutorial/vtk_output.html # -################################################################### - -# Uncomment the following lines for using VTK as output -#@VTK Output Activated -#CSALYA := $(CSALYA) -DVTK -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/VTK/vtkXMLWriterF.o -L/apps/VTK/6.1.0_patched/lib -lvtkIOXML-6.1 -lvtkIOGeometry-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkIOXMLParser-6.1 -lvtksys-6.1 -lvtkIOCore-6.1 -lvtkCommonExecutionModel-6.1 -lvtkCommonDataModel-6.1 -lvtkCommonMisc-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkCommonSystem-6.1 -lvtkCommonTransforms-6.1 -lvtkCommonMath-6.1 -lvtkCommonCore-6.1 -lvtkzlib-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkjsoncpp-6.1 -lvtkexpat-6.1 -L/gpfs/apps/MN3/INTEL/tbb/lib/intel64/ -ltbb - -################################################################### -# MUMPS # -# in MN4 use module load mkl intel impi mumps/5.1.2_metis4 -# if you just put mumps it loads mumps/5.1.2 that is with metis 5 -################################################################### - -#to automatically switch between metis 4 or 5 -choose one option -- for the moment I have only used metis4 -# because metis 5 was having problems with alya but guillaume seems to have solved that -# metis 5 just leave empty -#MUM_METV := -# metis 4 -#MUM_METV := _metis4 - -#@MUMPS Activated -#CSALYA := $(CSALYA) -DMUMPS -I/apps/MUMPS/5.1.2$(MUM_METV)/INTEL/IMPI/include/ -I${MKLROOT}/include - -#EXTRALIB := $(EXTRALIB) -L/apps/PARMETIS/4.0.3/INTEL/IMPI/lib/ -lparmetis -# you also need to do export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/PARMETIS/4.0.3/INTEL/IMPI/lib/ or put it in your bashrc -# since there is no module load for parmetis - -#EXTRALIB := $(EXTRALIB) -L/apps/MUMPS/5.1.2$(MUM_METV)/INTEL/IMPI/lib -ldmumps -lmumps_common -lpord - -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl - -# en la parte de openmp descomento EXTRALIB := $(EXTRALIB) -qopenmp que MUMPS necesita (ojo agrego la q creo me lo dijo ifort) - -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lmkl_blacs_intelmpi_ilp64 -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_ilp64 -lmkl_scalapack_lp64 - - -################################################################### -# NINJA # -#GPU based solvers : GMRES,DEFLATED_CG,CG # -#Specify solver in configuration as: # -#GMRES -------------> GGMR # -#CG ----------------> GCG # -#DEFLATED_CG -------> GDECG # -#GPU Multi Grid-----> GAMGX(Requires CONFIGURATION_FILE in solver)# -#export CUDA_HOME to CUDA version to be used # -################################################################### -# Uncomment the following lines to enable NINJA -#@NINJA Activated (Alya With GPU) -#GPU_HOME := ${CUDA_HOME}/ -#CSALYA := $(CSALYA) -DNINJA -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/ninja -L${GPU_HOME}lib64/ -lninja -lcublas_static -lcusparse_static -lculibos -lcudart -lpthread -lstdc++ -ldl -lcusolver - -# NINJA also support AMGX. Uncomment the following lines and set AMGX_HOME -#AMGX_HOME := -#CSALYA := $(CSALYA) -DAMGX -#EXTRALIB := $(EXTRALIB) -L${AMGX_HOME}/lib -lamgxsh - -################################################################### -# INVIZ # -#IN situ Visulization, using NVIDIA INDEX (GPUs only) # -#export CUDA_HOME to CUDA version to be used # -#export MPI_HOME to MPI version under use # -#export NVINDEX_ROOT to NVIDIA INDEX suite # -################################################################### -#Uncomment the following lines -#@ INVIZ Activated. -#CSALYA := $(CSALYA) -DINVIZ -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/inviz -linviz -lviewer -lz -lm -lHalf -llog4cplus -lrt -pthread -lstdc++ -ldl -lmpi_cxx -lcudart - -################################################################### -# CATALYST # -# # -#MANDATORY THIRDPARTY COMPILATION # -#Go to Alya/Thirdparties/Catalyst # -# # -#MORE INFO:hadrien.calmet at bsc.es # -################################################################### -# Uncomment the following lines for using CATALYST as output -#@CATALYST Activated. -#CSALYA := $(CSALYA) -DVTK -DCATA -mt_mpi -I../../Thirdparties/Catalyst -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/VTK/vtkXMLWriterF.o ../../Thirdparties/Catalyst/FEFortranAdaptor.o ../../Thirdparties/Catalyst/FECxxAdaptor.o -#EXTRALIB := $(EXTRALIB) -L/apps/PARAVIEW/4.2.0/lib/paraview-4.2/ -lvtkIOXML-pv4.2 -lvtkIOGeometry-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkIOXMLParser-pv4.2 -lvtksys-pv4.2 -lvtkIOCore-pv4.2 -lvtkCommonExecutionModel-pv4.2 -lvtkCommonDataModel-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkCommonMisc-pv4.2 -lvtkCommonSystem-pv4.2 -lvtkCommonTransforms-pv4.2 -lvtkCommonMath-pv4.2 -lvtkCommonCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkzlib-pv4.2 -lvtkjsoncpp-pv4.2 -lvtkexpat-pv4.2 -lvtkPVPythonCatalyst-pv4.2 -lvtkPVCatalyst-pv4.2 -lvtkPythonInterpreter-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkUtilitiesPythonInitializer-pv4.2 -lvtkPVServerManagerCore-pv4.2 -lvtkPVServerImplementationCore-pv4.2 -lvtkPVClientServerCoreCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkFiltersParallel-pv4.2 -lvtkFiltersModeling-pv4.2 -lvtkRenderingCore-pv4.2 -lvtkFiltersExtraction-pv4.2 -lvtkFiltersStatistics-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkImagingFourier-pv4.2 -lvtkImagingCore-pv4.2 -lvtkalglib-pv4.2 -lvtkFiltersGeometry-pv4.2 -lvtkFiltersSources-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkFiltersGeneral-pv4.2 -lvtkCommonComputationalGeometry-pv4.2 -lvtkFiltersProgrammable-pv4.2 -lvtkPVVTKExtensionsCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkPVCommon-pv4.2 -lvtkClientServer-pv4.2 -lvtkFiltersCore-pv4.2 -lvtkParallelMPI-pv4.2 -lvtkParallelCore-pv4.2 -lvtkIOLegacy-pv4.2 -#EXTRALIB := $(EXTRALIB) -lprotobuf -lvtkWrappingPython27Core-pv4.2 -lvtkpugixml-pv4.2 -lvtkPVServerManagerApplication-pv4.2 -L/gpfs/apps/MN3/INTEL/tbb/lib/intel64/ -ltbb - - -################################################################### -# PSBLAS & MLD2P4 # -################################################################### -# goes first else problem at link time -# for use with ./configure --with-psblas=/gpfs/projects/bsc21/WORK-HERBERT/svn/solvers-herbert/Thirdparties/psblas3-development -#CSALYA := $(CSALYA) -I../../Thirdparties/mld2p4-2-development/modules -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/mld2p4-2-development/lib -lmld_prec - -# for use with ./configure --with-blas=" -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl" --with-include-path =" -I${MKLROOT}/include" in psblas -#CSALYA := $(CSALYA) -DINC_PSBLAS -I${MKLROOT}/include -I../../Thirdparties/psblas3-development/modules -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl -L../../Thirdparties/psblas3-development/lib -lpsb_krylov -lpsb_prec -lpsb_util -lpsb_base - - -# solo cambio para que use la psblas ya compilado por salvatore -#CSALYA := $(CSALYA) -DINC_PSBLAS -I${MKLROOT}/include -I/gpfs/scratch/bsc21/bsc21495/NUMERICAL/PSBLAS/Intel/modules -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl -L/gpfs/scratch/bsc21/bsc21495/NUMERICAL/PSBLAS/Intel/lib -lmld_prec -lpsb_krylov -lpsb_prec -lpsb_util -lpsb_base - - -################################################################### -# AGMG Flags # -################################################################### -# the library yvan had sent me expired I created a new one with his object files using 'ar cr libdagmg_par.a dagmg_mpi.o' - -# parallel version prof (copy dagmg_par.o from Thirdparties/agmg) -#CSALYA := $(CSALYA) -DPARAL_AGMG -Duse_dagmg_mumps -I${MKLROOT}/include -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_lp64 -L../../Thirdparties/agmg/ -ldagmg_par - -################################################################### -# SUNDIALS CVODE # -################################################################### - -# Uncomment the following lines for using Sundials CVode -#@Sundials CVode Activated -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/sundials/sundials-install/lib/*.a ../../Thirdparties/sundials/sundials-install/lib/libsundials_cvode.a - -################################################################### -# DBParicles LIBRARY # -################################################################### - -# Uncomment the following lines for using DBParticles library -#@DBParicles Activated -#CSALYA := $(CSALYA) -DDBPARTICLES -#--FOR THRIFT connector-- -#@DBParicles THRIFT connector ON -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/dbparticles -ldbparticles -I/usr/local/include/thrift/ -I/usr/local/include -L/usr/local/lib -lm -lstdc++ -#EXTRALIB := $(EXTRALIB) -lthrift -lthriftnb -lthriftqt -lthriftz -#--FOR CASSANDRA connector-- -#@DBParicles CASSANDRA ON -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/dbparticles/MurmurHash3.o -L../../Thirdparties/dbparticles -ldbparticles -I/usr/lib/x86_64-linux-gnu/include -L/usr/lib/x86_64-linux-gnu -luv -lcassandra -I/usr/local/include -L/usr/local/lib -lm -lstdc++ - -################################################################### -# WSMP SOLVER ILLINOIS # -################################################################### - -# Uncomment the following lines to use WSMP library -#@WSMP SOLVER ILLINOIS Activated -#CSALYA := $(CSALYA) -DWSMP -#EXTRALIB := $(EXTRALIB) /gpfs/projects/bsc21/bsc21103/wsmpint/wsmp-Linux64-Intel/lib/ -lwsmp64 -lpthread /apps/LAPACK/3.5.0/lib -mkl - -################################################################### -# MPI-IO # -# # -#MORE INFO:damien.dosimont at bsc.es # -################################################################### -# Uncomment the following lines to generate I/O log files -#@I/O log files will be generated. -#CSALYA := $(CSALYA) -DMPIOLOG - -################################################################### -# BOAST # -# # -#MORE INFO:damien.dosimont at bsc.es # -################################################################### - -# This section will be filled automatically by BOAST. -# Don't add or touch anything - -#START_SECTION_BOAST -#END_SECTION_BOAST - -################################################################### -# DO NOT TOUCH # -################################################################### -FCFLAGS := $(FCFLAGS) $(CSALYA) $(EXTRAINC) - diff --git a/alya/configure.in/config.in.gpu b/alya/configure.in/config.in.gpu deleted file mode 100644 index 8ea0a93525831210d5334bcb9426eefbfd08c130..0000000000000000000000000000000000000000 --- a/alya/configure.in/config.in.gpu +++ /dev/null @@ -1,292 +0,0 @@ -################################################################### -# PGI CONFIGURE # -#POWER9 RECOMENDED MODULE: # -#module load ompi/3.0.0 pgi/18.4 # -################################################################### - -F77 = OMPI_FC=pgfortran mpif90 -F90 = OMPI_FC=pgfortran mpif90 -FCOCC = cc -c -FCFLAGS = -c -fast -Minfo=all -acc -ta=tesla:cuda10.1 -Mpreprocess -I./Objects_x/ -Mbackslash -Mextend -Mnoopenmp -Munroll -Mnoidiom -module $O -FPPFLAGS = -EXTRALIB = -lc -EXTRAINC = -fa2p = pgfortran -c -x f95-cpp-input -DMPI_OFF -J../../Utils/user/alya2pos -I../../Utils/user/alya2pos -fa2plk = pgfortran - -################################################################### -# PERFORMANCE FLAGS # -################################################################### -#MINUM -#FOPT = -O1 -#MAXIMUM (I have elimated -xHost due to observations by Yacine) -FOPT = -O3 - -#Compilation flags applied only to the Source/modules folder -#MODULEFLAGS = -ipo - -# Uncomment the following line to enable NDIME as a parameter (OPTIMIZATION FOR 3D PROBLEMS) -CSALYA := $(CSALYA) -DNDIMEPAR -DOPENACCHHH -DSUPER_FAST -DDETAILED_TIMES - -# Uncomment the following line for DEBUG AND CHECKING FLAGS -#CSALYA := $(CSALYA) -C -Ktrap=fp -Minform=inform - -# Vectorization: put vector size (in principle=4 for MN) -CSALYA := $(CSALYA) -DVECTOR_SIZE=32768 - -################################################################### -# USER SPECIFIC FLAGS # -################################################################### -# HERBERT -#CSALYA := $(CSALYA) -DDETAILS_ORTHOMIN - -################################################################### -# PROFILING FLAGS # -################################################################### -# Uncomment the following line to generate profiling info files -#CSALYA := $(CSALYA) -profile-loops=all -profile-loops-report=2 - -################################################################### -# EXTRAE FLAGS # -################################################################### -# Uncomment the following line to compile Alya using extrae -# Compiler used to compile extrae module (make extrae) -EXTRAE_FC=pgfortran -# Extrae installation directory (for linking) (not necessary if loading extrae using module load extrae) -#EXTRAE_HOME= -#@Linking with Extrae -#EXTRALIB := $(EXTRALIB) -L${EXTRAE_HOME}/lib/ -lmpitracef extrae_module.o -#@Enabling Extrae user calls (normal) -#CSALYA := $(CSALYA) -DEXTRAE - -################################################################### -# METIS LIBRARY # -################################################################### - -# Uncomment the following lines for using metis 4.0 (default) -EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-4.0 -lmetis -acc -ta=tesla:cuda10.1 -CSALYA := $(CSALYA) -DMETIS - -# Uncomment the following lines for using metis 5.0 -#CSALYA := $(CSALYA) -DV5METIS -# uncoment FOR MAC -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.0.2_i8/build/Darwin-i386/libmetis -lmetis -# uncoment FOR LINUX -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.0.2_i8/build/Linux-x86_64/libmetis -lmetis - -# Uncomment the following lines for using parmetis -#CSALYA := $(CSALYA) -DPARMETIS -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/ - -################################################################### -# BLAS LIBRARY # -################################################################### - -# Uncomment the following lines for using blas -#EXTRALIB := $(EXTRALIB) -lblas -#CSALYA := $(CSALYA) -DBLAS - -################################################################### -# MPI/THREADS/TASKS # -################################################################### - -# Uncomment the following lines for sequential (NOMPI) version -#VERSIONMPI = nompi -#CSALYA := $(CSALYA) -DMPI_OFF - -# Uncomment the following lines for OPENMP version -#CSALYA := $(CSALYA) -openmp -#EXTRALIB := $(EXTRALIB) -openmp - -# Uncomment the following line to disable OPENMP coloring -#CSALYA := $(CSALYA) -DNO_COLORING - -# Uncomment the following line to enable OMPSS in order to activate -# loops with multidependencies whenever we have a race condition -#CSALYA := $(CSALYA) -DALYA_OMPSS - -################################################################### -# DLB # -# To use DLB with OMPSS: Define -DALYA_DLB -DALYA_OMPSS # -# To use DLB barrier: Define -DALYA_DLB_BARRIER # -# If we compile with mercurium, there is no need to link # -# with DLB # -# # -################################################################### - -#CSALYA := $(CSALYA) -DALYA_DLB -#FCFLAGS := $(FCFLAGS) --dlb -#FPPFLAGS := $(FPPFLAGS) --dlb - -################################################################### -# INTEGER TYPE # -################################################################### -# Uncomment the following lines for 8 byte integers -#CSALYA := $(CSALYA) -DI8 -m64 -# Uncomment the following line for 8 byte integers ONLY in Metis -# In this mode Alya can work in I4 and Metis in I8 (mixed mode) -# For this option the use of Metis v5 library is mandatory -#CSALYA := $(CSALYA) -DMETISI8 - - -################################################################### -# HDF5 # -# # -#MANDATORY MODULES TO LOAD ON MN3 FOR 4 BYTE INTEGERS VERSION # -#module load HDF5/1.8.14 SZIP # -# # -#MANDATORY MODULES TO LOAD ON MN3 FOR 8 BYTE INTEGERS VERSION # -#module load SZIP/2.1 HDF5/1.8.15p1_static # -# # -#MANDATORY SERVICES TO COMPILE # -#hdfpos # -#MORE INFO:bsccase02.bsc.es/alya/tutorial/hdf5_output.html # -################################################################### - -# Uncomment the following lines for using HDF5 IN 4 BYTE INTEGER VERSION -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.14/INTEL/IMPI/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5_fortran.a /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5hl_fortran.a /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# Uncomment the following lines for using HDF5 IN 8 BYTE INTEGER VERSION -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.15p1/static/include -#EXTRALIB := $(EXTRALIB) /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_fortran.a /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_f90cstub.a -#EXTRALIB := $(EXTRALIB) /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5.a /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_tools.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# latest Hdf5 (1.8.16) to be used with latest impi (5.1.2.150) and intel(16.0.1) -# there is module load HDF5/1.8.16-mpi for integers 4 but I have seen it is not necesary to call it -# Toni should clean up this - my reconmendation is to only leave this last version -# moreover there seems to be no need for module load hdf5 (in the i8 case there exists no module load) - -# Uncomment the following lines for using HDF5 IN 4 BYTE INTEGER VERSION -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.16/INTEL/IMPI/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5_fortran.a /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5hl_fortran.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5.a /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# Uncomment the following lines for using HDF5 IN 8 BYTE INTEGER VERSION -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.16/INTEL/IMPI/int8/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5_fortran.a /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5hl_fortran.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5.a /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -################################################################### -# VTK # -# # -#MANDATORY THIRDPARTY COMPILATION # -#Go to Alya/Thirdparties/VTK # -# # -#MORE INFO:bsccase02.bsc.es/alya/tutorial/vtk_output.html # -################################################################### - -# Uncomment the following lines for using VTK as output -#CSALYA := $(CSALYA) -DVTK -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/VTK/vtkXMLWriterF.o -L/apps/VTK/6.1.0_patched/lib -lvtkIOXML-6.1 -lvtkIOGeometry-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkIOXMLParser-6.1 -lvtksys-6.1 -lvtkIOCore-6.1 -lvtkCommonExecutionModel-6.1 -lvtkCommonDataModel-6.1 -lvtkCommonMisc-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkCommonSystem-6.1 -lvtkCommonTransforms-6.1 -lvtkCommonMath-6.1 -lvtkCommonCore-6.1 -lvtkzlib-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkjsoncpp-6.1 -lvtkexpat-6.1 -L/gpfs/apps/MN3/INTEL/tbb/lib/intel64/ -ltbb - -################################################################### -# NINJA # -#GPU based solvers : GMRES,DEFLATED_CG,CG # -#Specify solver in configuration as: # -#GMRES -------------> GGMR # -#CG ----------------> GCG # -#DEFLATED_CG -------> GDECG # -#GPU Multi Grid-----> GAMGX(Requires CONFIGURATION_FILE in solver)# -#export CUDA_HOME to CUDA version to be used # -################################################################### -# Uncomment the following lines to enable NINJA -#GPU_HOME := ${CUDA_HOME}/ -#CSALYA := $(CSALYA) -DNINJA -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/ninja -L${GPU_HOME}lib64/ -L/lib64 -lninja -lcublas_static -lcublasLt -lcusparse_static -lculibos -lcudart -lpthread -lstdc++ -ldl -lcusolver - -# NINJA also support AMGX. Uncomment the following lines and set AMGX_HOME -#AMGX_HOME := -#CSALYA := $(CSALYA) -DAMGX -#EXTRALIB := $(EXTRALIB) -L${AMGX_HOME}/lib -lamgxsh - -################################################################### -# CATALYST # -# # -#MANDATORY THIRDPARTY COMPILATION # -#Go to Alya/Thirdparties/Catalyst # -# # -#MORE INFO:hadrien.calmet at bsc.es # -################################################################### -# Uncomment the following lines for using CATALYST as output -#CSALYA := $(CSALYA) -DVTK -DCATA -mt_mpi -I../../Thirdparties/Catalyst -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/VTK/vtkXMLWriterF.o ../../Thirdparties/Catalyst/FEFortranAdaptor.o ../../Thirdparties/Catalyst/FECxxAdaptor.o -#EXTRALIB := $(EXTRALIB) -L/apps/PARAVIEW/4.2.0/lib/paraview-4.2/ -lvtkIOXML-pv4.2 -lvtkIOGeometry-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkIOXMLParser-pv4.2 -lvtksys-pv4.2 -lvtkIOCore-pv4.2 -lvtkCommonExecutionModel-pv4.2 -lvtkCommonDataModel-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkCommonMisc-pv4.2 -lvtkCommonSystem-pv4.2 -lvtkCommonTransforms-pv4.2 -lvtkCommonMath-pv4.2 -lvtkCommonCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkzlib-pv4.2 -lvtkjsoncpp-pv4.2 -lvtkexpat-pv4.2 -lvtkPVPythonCatalyst-pv4.2 -lvtkPVCatalyst-pv4.2 -lvtkPythonInterpreter-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkUtilitiesPythonInitializer-pv4.2 -lvtkPVServerManagerCore-pv4.2 -lvtkPVServerImplementationCore-pv4.2 -lvtkPVClientServerCoreCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkFiltersParallel-pv4.2 -lvtkFiltersModeling-pv4.2 -lvtkRenderingCore-pv4.2 -lvtkFiltersExtraction-pv4.2 -lvtkFiltersStatistics-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkImagingFourier-pv4.2 -lvtkImagingCore-pv4.2 -lvtkalglib-pv4.2 -lvtkFiltersGeometry-pv4.2 -lvtkFiltersSources-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkFiltersGeneral-pv4.2 -lvtkCommonComputationalGeometry-pv4.2 -lvtkFiltersProgrammable-pv4.2 -lvtkPVVTKExtensionsCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkPVCommon-pv4.2 -lvtkClientServer-pv4.2 -lvtkFiltersCore-pv4.2 -lvtkParallelMPI-pv4.2 -lvtkParallelCore-pv4.2 -lvtkIOLegacy-pv4.2 -#EXTRALIB := $(EXTRALIB) -lprotobuf -lvtkWrappingPython27Core-pv4.2 -lvtkpugixml-pv4.2 -lvtkPVServerManagerApplication-pv4.2 -L/gpfs/apps/MN3/INTEL/tbb/lib/intel64/ -ltbb - -################################################################### -# EoCoE Flags # -################################################################### - -# Uncomment the following lines to output matrices # -#CSALYA := $(CSALYA) -Doutmateocoe -# Uncomment the following lines to solve with AGMG # -#MANDATORY MODULES TO LOAD ON MN3 -#module load MKL/11.3 -# serial version -- this is obsolete actually for it to work you need to touch the ifdef in dagmg.f90 -#CSALYA := $(CSALYA) -Dsolve_w_agmg -I${MKLROOT}/include -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -# -# -# parallel version academic -#CSALYA := $(CSALYA) -DPARAL_AGMG -DPARAL_AGMG_ACAD -I${MKLROOT}/include -I/gpfs/projects/bsc21/WORK-HERBERT/svnmn3/MUMPS_5.0.1/include -#For Mumps -#EXTRALIB := $(EXTRALIB) -L/gpfs/projects/bsc21/WORK-HERBERT/svnmn3/MUMPS_5.0.1/lib -ldmumps -lmumps_common -lpord -#For agmg -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lmkl_blacs_intelmpi_ilp64 -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_ilp64 -lmkl_scalapack_lp64 -# -# parallel version prof (copy dagmg_par.o from Thirdparties/agmg) -#CSALYA := $(CSALYA) -DPARAL_AGMG -Duse_dagmg_mumps -I${MKLROOT}/include -#For agmg -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lmkl_blacs_intelmpi_ilp64 -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_ilp64 -lmkl_scalapack_lp64 -L../../Thirdparties/agmg/ -ldagmg_par - -################################################################### -# SUNDIALS CVODE # -################################################################### - -# Uncomment the following lines for using Sundials CVode -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/sundials/sundials-install/lib/*.a ../../Thirdparties/sundials/sundials-install/lib/libsundials_cvode.a - - - -################################################################### -# DBParicles LIBRARY # -################################################################### - -# Uncomment the following lines for using DBParticles library -#CSALYA := $(CSALYA) -DDBPARTICLES -#--FOR THRIFT connector-- -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/dbparticles -ldbparticles -I/usr/local/include/thrift/ -I/usr/local/include -L/usr/local/lib -lm -lstdc++ -#EXTRALIB := $(EXTRALIB) -lthrift -lthriftnb -lthriftqt -lthriftz -#--FOR CASSANDRA connector-- -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/dbparticles/MurmurHash3.o -L../../Thirdparties/dbparticles -ldbparticles -I/usr/lib/x86_64-linux-gnu/include -L/usr/lib/x86_64-linux-gnu -luv -lcassandra -I/usr/local/include -L/usr/local/lib -lm -lstdc++ - -################################################################### -# WSMP SOLVER ILLINOIS # -################################################################### - -# Uncomment the following lines to use WSMP library -#CSALYA := $(CSALYA) -DWSMP -#EXTRALIB := $(EXTRALIB) /gpfs/projects/bsc21/bsc21103/wsmpint/wsmp-Linux64-Intel/lib/ -lwsmp64 -lpthread /apps/LAPACK/3.5.0/lib -mkl - -################################################################### -# DO NOT TOUCH # -################################################################### -FCFLAGS := $(FCFLAGS) $(CSALYA) $(EXTRAINC) - diff --git a/alya/configure.in/config.in.skylake b/alya/configure.in/config.in.skylake deleted file mode 100644 index 920a83d18e4edbac8584fc09020c4ac415528500..0000000000000000000000000000000000000000 --- a/alya/configure.in/config.in.skylake +++ /dev/null @@ -1,416 +0,0 @@ -################################################################### -# IFORT CONFIGURE # -#MN4 RECOMENDED MODULE: # -#module load intel impi # -################################################################### - -#@Compiler: Using Intel Fortran Compiler ifort. -F77 = mpif90 -F90 = mpif90 -FCOCC = mpicc -c -FCFLAGS = -module $O -c -FPPFLAGS = -fpp -EXTRALIB = -EXTRAINC = -fa2p = mpif90 -module ../../Utils/user/alya2pos -c -fpp -fa2plk = mpif90 - -################################################################### -# PERFORMANCE FLAGS # -################################################################### - -#@Optimization: O3 -FOPT = -O3 - -#Compilation flags applied only to the Source/modules folder -#@Special flags only for Souce/modules folder activaded. -#MODULEFLAGS = - -#Uncomment the following line to enable NDIME as a parameter (OPTIMIZATION FOR 3D PROBLEMS) -#@Optimization for 3D PROBLEMS activated. -CSALYA := $(CSALYA) -DNDIMEPAR - -#Uncomment the following line for DEBUG AND CHECKING FLAGS -#@Debug flags activated. -#CSALYA := $(CSALYA) -ftrapuv -check all,noarg_temp_created -traceback -debug full -warn all,nodec,nointerfaces -fp-stack-check -ansi-alias - -#Vectorization: put vector size (in principle=4 for MN) -#@Vectorization activated. -CSALYA := $(CSALYA) -DVECTOR_SIZE=16 - -#@Marenostrum IV Optimizations -CSALYA := $(CSALYA) -xCORE-AVX512 -mtune=skylake -ipo - -################################################################### -# PROFILING FLAGS # -################################################################### - -# Uncomment the following line to generate profiling info files -#@Profiling info files will be generated. -#CSALYA := $(CSALYA) -profile-loops=all -profile-loops-report=2 - -################################################################### -# EXTRAE FLAGS # -################################################################### - -# Uncomment the following line to compile Alya using extrae -# 1. Define EXTRAE_HOME -# 2. Add xml file. See e.g. in /apps/BSCTOOLS/extrae/latest/impi_2017_4/share/example/*/extrae.xml -# 3. In batch file, use: srun ./trace.sh Alya.x xxx - -#@Linking with Extrae -#EXTRAE_HOME =/apps/BSCTOOLS/extrae/latest/impi_2017_4 -#EXTRALIB := $(EXTRALIB) -L${EXTRAE_HOME}/lib/ -lmpitracef -#@Enabling Extrae user calls (normal) -#CSALYA := $(CSALYA) -DEXTRAE - -################################################################### -# METIS LIBRARY # -################################################################### - -#@Using Metis 4.0 -EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-4.0 -lmetis -CSALYA := $(CSALYA) -DMETIS - -#@Using Metis 5.0.2 or metis 5.1.0 -#CSALYA := $(CSALYA) -DV5METIS -#CSALYA := $(CSALYA) -DV51METIS -#@Using Metis 5.0.2 or metis 5.1.0 for MAC -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.0.2_i8/build/Darwin-x86_64/libmetis -lmetis -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.1.2_i8/build/Darwin-x86_64/libmetis -lmetis -#@Using Metis 5.0.2 or metis 5.1.0 for Linux -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.0.2_i8/build/Linux-x86_64/libmetis -lmetis -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/metis-5.1.0_i8/build/Linux-x86_64/libmetis -lmetis - -#@Using parametis. -#CSALYA := $(CSALYA) -DPARMETIS -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/ - -################################################################### -# ZOLTAN LIBRARY # -################################################################### - -#EXTRALIB := $(EXTRALIB) -L/gpfs/projects/bsc21/bsc21499/Zoltan_v3.83/Build/lib -lzoltan -#CSALYA := $(CSALYA) -I/gpfs/projects/bsc21/bsc21499/Zoltan_v3.83/Build/include -#CSALYA := $(CSALYA) -DZOLTAN - -################################################################### -# BLAS LIBRARY # -################################################################### - -# Uncomment the following lines for using blas -#@BLAS Activated -#EXTRALIB := $(EXTRALIB) -lblas -#CSALYA := $(CSALYA) -DBLAS - -################################################################### -# MPI/THREADS/TASKS # -################################################################### - -# Uncomment the following lines for sequential (NOMPI) version -#@Sequential ALYA. -#VERSIONMPI = nompi -#CSALYA := $(CSALYA) -DMPI_OFF - -# Uncomment the following lines for OPENMP version -#@OpenMP Activated. heap-arrays option may be used to stack overflow and have Alya crash -#CSALYA := $(CSALYA) -qopenmp -#CSALYA := $(CSALYA) -heap-arrays -#EXTRALIB := $(EXTRALIB) -qopenmp - -# Uncomment the following line to disable OPENMP coloring -#@OpenMP Coloring Activated. -#CSALYA := $(CSALYA) -DNO_COLORING - -# Uncomment the following line to enable OMPSS in order to activate -# loops with multidependencies whenever we have a race condition -#@OMPPS Activated. -#CSALYA := $(CSALYA) -DALYA_OMPSS - -# To activate MPI3 (non-blocking MPI_Allreduce) -#@MPI3 activated. -#CSALYA := $(CSALYA) -DMPI3 - -################################################################### -# DLB # -# To use DLB with OMPSS: Define -DALYA_DLB -DALYA_OMPSS # -# To use DLB barrier: Define -DALYA_DLB_BARRIER # -# If we compile with mercurium, there is no need to link # -# with DLB # -# # -################################################################### - -#@DLB Activated. -#CSALYA := $(CSALYA) -DALYA_DLB -#FCFLAGS := $(FCFLAGS) --dlb -#FPPFLAGS := $(FPPFLAGS) --dlb - -################################################################### -# TALP # -# In the batcj script, run the following way: # -# export DLB_ARGS=--talp # -# module load dlb/git # -# srun env LD_PRELOAD=$DLB_HOME/lib/libdlb_mpi.so ./Alya.x xxx # -# # -################################################################### - -#@DLB_TALP Activated. -#CSALYA := $(CSALYA) -DALYA_TALP -#CSALYA := $(CSALYA) -I$(DLB_HOME)/include -#EXTRALIB := $(EXTRALIB) -L$(DLB_HOME)/lib -ldlb -Wl,-rpath,$(DLB_HOME)/lib - -################################################################### -# INTEGER TYPE # -################################################################### -# Uncomment the following lines for 8 byte integers -#@Integers size: 8 -#CSALYA := $(CSALYA) -DI8 -m64 - - -################################################################### -# CANTERA # -################################################################### -# Uncomment the following lines for Cantera -#@Using Cantera -#CANTERA=1 -#@ Marenostrum 4 -#ROOT_BOOST=/gpfs/projects/bsc21/Cantera-Alya/boost_1_68_0/ -#ROOT_CANTERA=/gpfs/projects/bsc21/Cantera-Alya/cantera/INTEL/ -#@ Nord3 -#ROOT_BOOST=/apps/BOOST/1_53_0/IMPI/include/ -#ROOT_CANTERA=/gpfs/projects/bsc21/Cantera-Alya/cantera/INTEL_NORD3/ -################################################################### -# HDF5 # -# # -#MANDATORY MODULES TO LOAD ON MN3 FOR 4 BYTE INTEGERS VERSION # -#module load HDF5/1.8.14 SZIP # -# # -#MANDATORY MODULES TO LOAD ON MN3 FOR 8 BYTE INTEGERS VERSION # -#module load SZIP/2.1 HDF5/1.8.15p1_static # -# # -#MANDATORY SERVICES TO COMPILE # -#hdfpos # -#MORE INFO:bsccase02.bsc.es/alya/tutorial/hdf5_output.html # -################################################################### - -# Uncomment the following lines for using HDF5 IN 4 BYTE INTEGER VERSION -#@HDF5 4 Bytes integer Activated please load the module HDF5/1.8.14 SZIP -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.14/INTEL/IMPI/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5_fortran.a /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5hl_fortran.a /apps/HDF5/1.8.14/INTEL/IMPI/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# Uncomment the following lines for using HDF5 IN 8 BYTE INTEGER VERSION -#@HDF5 8 Bytes integer Activated please load the module SZIP/2.1 HDF5/1.8.15p1_static -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.15p1/static/include -#EXTRALIB := $(EXTRALIB) /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_fortran.a /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_f90cstub.a -#EXTRALIB := $(EXTRALIB) /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5.a /gpfs/apps/MN3/HDF5/1.8.15p1/static/lib/libhdf5_tools.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# latest Hdf5 (1.8.16) to be used with latest impi (5.1.2.150) and intel(16.0.1) -# there is module load HDF5/1.8.16-mpi for integers 4 but I have seen it is not necesary to call it -# Toni should clean up this - my reconmendation is to only leave this last version -# moreover there seems to be no need for module load hdf5 (in the i8 case there exists no module load) - -# Uncomment the following lines for using HDF5 IN 4 BYTE INTEGER VERSION -#@HDF5 1.8.16 - 4 Bytes integer Activated -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.16/INTEL/IMPI/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5_fortran.a /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5hl_fortran.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5.a /apps/HDF5/1.8.16/INTEL/IMPI/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -# Uncomment the following lines for using HDF5 IN 8 BYTE INTEGER VERSION -#@HDF5 1.8.16 - 8 Bytes integer Activated -#CSALYA := $(CSALYA) -I/apps/HDF5/1.8.16/INTEL/IMPI/int8/include -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5_fortran.a /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5hl_fortran.a -#EXTRALIB := $(EXTRALIB) /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5.a /apps/HDF5/1.8.16/INTEL/IMPI/int8/lib/libhdf5_hl.a -#EXTRALIB := $(EXTRALIB) -L/apps/SZIP/2.1/lib -lsz -L/usr/lib64 -lz -lgpfs - -################################################################### -# VTK # -# # -#MANDATORY THIRDPARTY COMPILATION # -#Go to Alya/Thirdparties/VTK # -# # -#MORE INFO:bsccase02.bsc.es/alya/tutorial/vtk_output.html # -################################################################### - -# Uncomment the following lines for using VTK as output -#@VTK Output Activated -#CSALYA := $(CSALYA) -DVTK -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/VTK/vtkXMLWriterF.o -L/apps/VTK/6.1.0_patched/lib -lvtkIOXML-6.1 -lvtkIOGeometry-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkIOXMLParser-6.1 -lvtksys-6.1 -lvtkIOCore-6.1 -lvtkCommonExecutionModel-6.1 -lvtkCommonDataModel-6.1 -lvtkCommonMisc-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkCommonSystem-6.1 -lvtkCommonTransforms-6.1 -lvtkCommonMath-6.1 -lvtkCommonCore-6.1 -lvtkzlib-6.1 -#EXTRALIB := $(EXTRALIB) -lvtkjsoncpp-6.1 -lvtkexpat-6.1 -L/gpfs/apps/MN3/INTEL/tbb/lib/intel64/ -ltbb - -################################################################### -# MUMPS # -# in MN4 use module load mkl intel impi mumps/5.1.2_metis4 -# if you just put mumps it loads mumps/5.1.2 that is with metis 5 -################################################################### - -#to automatically switch between metis 4 or 5 -choose one option -- for the moment I have only used metis4 -# because metis 5 was having problems with alya but guillaume seems to have solved that -# metis 5 just leave empty -#MUM_METV := -# metis 4 -#MUM_METV := _metis4 - -#@MUMPS Activated -#CSALYA := $(CSALYA) -DMUMPS -I/apps/MUMPS/5.1.2$(MUM_METV)/INTEL/IMPI/include/ -I${MKLROOT}/include - -#EXTRALIB := $(EXTRALIB) -L/apps/PARMETIS/4.0.3/INTEL/IMPI/lib/ -lparmetis -# you also need to do export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/PARMETIS/4.0.3/INTEL/IMPI/lib/ or put it in your bashrc -# since there is no module load for parmetis - -#EXTRALIB := $(EXTRALIB) -L/apps/MUMPS/5.1.2$(MUM_METV)/INTEL/IMPI/lib -ldmumps -lmumps_common -lpord - -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl - -# en la parte de openmp descomento EXTRALIB := $(EXTRALIB) -qopenmp que MUMPS necesita (ojo agrego la q creo me lo dijo ifort) - -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lmkl_blacs_intelmpi_ilp64 -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_ilp64 -lmkl_scalapack_lp64 - - -################################################################### -# NINJA # -#GPU based solvers : GMRES,DEFLATED_CG,CG # -#Specify solver in configuration as: # -#GMRES -------------> GGMR # -#CG ----------------> GCG # -#DEFLATED_CG -------> GDECG # -#GPU Multi Grid-----> GAMGX(Requires CONFIGURATION_FILE in solver)# -#export CUDA_HOME to CUDA version to be used # -################################################################### -# Uncomment the following lines to enable NINJA -#@NINJA Activated (Alya With GPU) -#GPU_HOME := ${CUDA_HOME}/ -#CSALYA := $(CSALYA) -DNINJA -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/ninja -L${GPU_HOME}lib64/ -lninja -lcublas_static -lcusparse_static -lculibos -lcudart -lpthread -lstdc++ -ldl -lcusolver - -# NINJA also support AMGX. Uncomment the following lines and set AMGX_HOME -#AMGX_HOME := -#CSALYA := $(CSALYA) -DAMGX -#EXTRALIB := $(EXTRALIB) -L${AMGX_HOME}/lib -lamgxsh - -################################################################### -# INVIZ # -#IN situ Visulization, using NVIDIA INDEX (GPUs only) # -#export CUDA_HOME to CUDA version to be used # -#export MPI_HOME to MPI version under use # -#export NVINDEX_ROOT to NVIDIA INDEX suite # -################################################################### -#Uncomment the following lines -#@ INVIZ Activated. -#CSALYA := $(CSALYA) -DINVIZ -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/inviz -linviz -lviewer -lz -lm -lHalf -llog4cplus -lrt -pthread -lstdc++ -ldl -lmpi_cxx -lcudart - -################################################################### -# CATALYST # -# # -#MANDATORY THIRDPARTY COMPILATION # -#Go to Alya/Thirdparties/Catalyst # -# # -#MORE INFO:hadrien.calmet at bsc.es # -################################################################### -# Uncomment the following lines for using CATALYST as output -#@CATALYST Activated. -#CSALYA := $(CSALYA) -DVTK -DCATA -mt_mpi -I../../Thirdparties/Catalyst -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/VTK/vtkXMLWriterF.o ../../Thirdparties/Catalyst/FEFortranAdaptor.o ../../Thirdparties/Catalyst/FECxxAdaptor.o -#EXTRALIB := $(EXTRALIB) -L/apps/PARAVIEW/4.2.0/lib/paraview-4.2/ -lvtkIOXML-pv4.2 -lvtkIOGeometry-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkIOXMLParser-pv4.2 -lvtksys-pv4.2 -lvtkIOCore-pv4.2 -lvtkCommonExecutionModel-pv4.2 -lvtkCommonDataModel-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkCommonMisc-pv4.2 -lvtkCommonSystem-pv4.2 -lvtkCommonTransforms-pv4.2 -lvtkCommonMath-pv4.2 -lvtkCommonCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkzlib-pv4.2 -lvtkjsoncpp-pv4.2 -lvtkexpat-pv4.2 -lvtkPVPythonCatalyst-pv4.2 -lvtkPVCatalyst-pv4.2 -lvtkPythonInterpreter-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkUtilitiesPythonInitializer-pv4.2 -lvtkPVServerManagerCore-pv4.2 -lvtkPVServerImplementationCore-pv4.2 -lvtkPVClientServerCoreCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkFiltersParallel-pv4.2 -lvtkFiltersModeling-pv4.2 -lvtkRenderingCore-pv4.2 -lvtkFiltersExtraction-pv4.2 -lvtkFiltersStatistics-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkImagingFourier-pv4.2 -lvtkImagingCore-pv4.2 -lvtkalglib-pv4.2 -lvtkFiltersGeometry-pv4.2 -lvtkFiltersSources-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkFiltersGeneral-pv4.2 -lvtkCommonComputationalGeometry-pv4.2 -lvtkFiltersProgrammable-pv4.2 -lvtkPVVTKExtensionsCore-pv4.2 -#EXTRALIB := $(EXTRALIB) -lvtkPVCommon-pv4.2 -lvtkClientServer-pv4.2 -lvtkFiltersCore-pv4.2 -lvtkParallelMPI-pv4.2 -lvtkParallelCore-pv4.2 -lvtkIOLegacy-pv4.2 -#EXTRALIB := $(EXTRALIB) -lprotobuf -lvtkWrappingPython27Core-pv4.2 -lvtkpugixml-pv4.2 -lvtkPVServerManagerApplication-pv4.2 -L/gpfs/apps/MN3/INTEL/tbb/lib/intel64/ -ltbb - - -################################################################### -# PSBLAS & MLD2P4 # -################################################################### -# goes first else problem at link time -# for use with ./configure --with-psblas=/gpfs/projects/bsc21/WORK-HERBERT/svn/solvers-herbert/Thirdparties/psblas3-development -#CSALYA := $(CSALYA) -I../../Thirdparties/mld2p4-2-development/modules -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/mld2p4-2-development/lib -lmld_prec - -# for use with ./configure --with-blas=" -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl" --with-include-path =" -I${MKLROOT}/include" in psblas -#CSALYA := $(CSALYA) -DINC_PSBLAS -I${MKLROOT}/include -I../../Thirdparties/psblas3-development/modules -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl -L../../Thirdparties/psblas3-development/lib -lpsb_krylov -lpsb_prec -lpsb_util -lpsb_base - - -# solo cambio para que use la psblas ya compilado por salvatore -#CSALYA := $(CSALYA) -DINC_PSBLAS -I${MKLROOT}/include -I/gpfs/scratch/bsc21/bsc21495/NUMERICAL/PSBLAS/Intel/modules -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl -L/gpfs/scratch/bsc21/bsc21495/NUMERICAL/PSBLAS/Intel/lib -lmld_prec -lpsb_krylov -lpsb_prec -lpsb_util -lpsb_base - - -################################################################### -# AGMG Flags # -################################################################### -# the library yvan had sent me expired I created a new one with his object files using 'ar cr libdagmg_par.a dagmg_mpi.o' - -# parallel version prof (copy dagmg_par.o from Thirdparties/agmg) -#CSALYA := $(CSALYA) -DPARAL_AGMG -Duse_dagmg_mumps -I${MKLROOT}/include -#EXTRALIB := $(EXTRALIB) -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lmkl_blacs_intelmpi_lp64 -lmkl_scalapack_lp64 -L../../Thirdparties/agmg/ -ldagmg_par - -################################################################### -# SUNDIALS CVODE # -################################################################### - -# Uncomment the following lines for using Sundials CVode -#@Sundials CVode Activated -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/sundials/sundials-install/lib/*.a ../../Thirdparties/sundials/sundials-install/lib/libsundials_cvode.a - -################################################################### -# DBParicles LIBRARY # -################################################################### - -# Uncomment the following lines for using DBParticles library -#@DBParicles Activated -#CSALYA := $(CSALYA) -DDBPARTICLES -#--FOR THRIFT connector-- -#@DBParicles THRIFT connector ON -#EXTRALIB := $(EXTRALIB) -L../../Thirdparties/dbparticles -ldbparticles -I/usr/local/include/thrift/ -I/usr/local/include -L/usr/local/lib -lm -lstdc++ -#EXTRALIB := $(EXTRALIB) -lthrift -lthriftnb -lthriftqt -lthriftz -#--FOR CASSANDRA connector-- -#@DBParicles CASSANDRA ON -#EXTRALIB := $(EXTRALIB) ../../Thirdparties/dbparticles/MurmurHash3.o -L../../Thirdparties/dbparticles -ldbparticles -I/usr/lib/x86_64-linux-gnu/include -L/usr/lib/x86_64-linux-gnu -luv -lcassandra -I/usr/local/include -L/usr/local/lib -lm -lstdc++ - -################################################################### -# WSMP SOLVER ILLINOIS # -################################################################### - -# Uncomment the following lines to use WSMP library -#@WSMP SOLVER ILLINOIS Activated -#CSALYA := $(CSALYA) -DWSMP -#EXTRALIB := $(EXTRALIB) /gpfs/projects/bsc21/bsc21103/wsmpint/wsmp-Linux64-Intel/lib/ -lwsmp64 -lpthread /apps/LAPACK/3.5.0/lib -mkl - -################################################################### -# MPI-IO # -# # -#MORE INFO:damien.dosimont at bsc.es # -################################################################### -# Uncomment the following lines to generate I/O log files -#@I/O log files will be generated. -#CSALYA := $(CSALYA) -DMPIOLOG - -################################################################### -# BOAST # -# # -#MORE INFO:damien.dosimont at bsc.es # -################################################################### - -# This section will be filled automatically by BOAST. -# Don't add or touch anything - -#START_SECTION_BOAST -#END_SECTION_BOAST - -################################################################### -# DO NOT TOUCH # -################################################################### -FCFLAGS := $(FCFLAGS) $(CSALYA) $(EXTRAINC) - diff --git a/code_saturne/CS_collect_timing.sh b/code_saturne/CS_collect_timing.sh new file mode 100644 index 0000000000000000000000000000000000000000..3b2549b236601581003d0d5567efb228262b5c0e --- /dev/null +++ b/code_saturne/CS_collect_timing.sh @@ -0,0 +1,40 @@ +#!/bin/bash +# +# Read file timer_stats.csv +# +# +export FILE_LENGTH=`wc -l < timer_stats.csv` +# +## echo "Number of lines $FILE_LENGTH" +# +export TAIL_LINE_NUMBER="$(($FILE_LENGTH-4))" +# +## echo $TAIL_LINE_NUMBER +# +tail -$TAIL_LINE_NUMBER timer_stats.csv > timer_1st.tmp +# +##more timer_1st.tmp +# +awk '{print $2}' timer_1st.tmp > timer_2nd.tmp +# +sed 's/,//g' timer_2nd.tmp > timer_1st.tmp +# +export FILE_LENGTH=`wc -l < timer_1st.tmp` +# +## echo "Number of lines $FILE_LENGTH" +# +export FILE_LENGTH=$(($FILE_LENGTH-1)) +# +export HEAD_LINE_NUMBER="-$FILE_LENGTH" +# +head $HEAD_LINE_NUMBER timer_1st.tmp > timer_2nd.tmp +# +export sum_of_lines=`awk '{s+=$1}END{print s}' timer_2nd.tmp` +## echo "Sum of the lines of the file: $sum_of_lines" +# +##more timer_2nd.tmp +# +export average_timing=`echo "$sum_of_lines / $FILE_LENGTH" | bc -l` +echo "Averaged timing for the $FILE_LENGTH entries: $average_timing" +# +rm -rf *.tmp diff --git a/code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf b/code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf deleted file mode 100644 index 3bc660dbe2357818a822bd42561e7459a8421537..0000000000000000000000000000000000000000 Binary files a/code_saturne/Code_Saturne_Build_Run_5.3_UEABS.pdf and /dev/null differ diff --git a/code_saturne/InstallHPC.sh b/code_saturne/InstallHPC.sh new file mode 100644 index 0000000000000000000000000000000000000000..d151e2890106615c0b806c1e043ba42b6c2710e1 --- /dev/null +++ b/code_saturne/InstallHPC.sh @@ -0,0 +1,64 @@ +#!/bin/sh + +################################# +## Which version of the code ? ## +################################# + +CODE_VERSION=7.0.0 +KER_VERSION=${CODE_VERSION} +KERNAME=code_saturne-${KER_VERSION} + +################################################ +## Installation PATH in the current directory ## +################################################ + +INSTALLPATH=`pwd` + +echo $INSTALLPATH + +##################################### +## Environment variables and PATHS ## +##################################### + +NOM_ARCH=`uname -s` + +CS_HOME=${INSTALLPATH}/${KERNAME} + +export PATH=$CS_HOME/bin:$PATH + +############## +## Cleaning ## +############## + +rm -rf $CS_HOME/arch/* +rm -rf $INSTALLPATH/$KERNAME.build + +######################### +## Kernel Installation ## +######################### + +KERSRC=$INSTALLPATH/$KERNAME +KERBUILD=$INSTALLPATH/$KERNAME.build/arch/$NOM_ARCH +KEROPT=$INSTALLPATH/$KERNAME/arch/$NOM_ARCH + +export KEROPT + +mkdir -p $KERBUILD +cd $KERBUILD + + +$KERSRC/configure \ +--disable-shared \ +--disable-nls \ +--without-modules \ +--disable-gui \ +--enable-long-gnum \ +--disable-mei \ +--enable-debug \ +--prefix=$KEROPT \ +CC="cc" CFLAGS="-O3" FC="ftn" FCFLAGS="-O3" CXX="CC" CXXFLAGS="-O3" + +make -j 8 +make install + +cd $INSTALLPATH diff --git a/code_saturne/README.md b/code_saturne/README.md index 3f1437b5728b39d0052db6b289db40d45ffb74d7..86550eefb115f58d07d87a743288dca6c2d2100d 100644 --- a/code_saturne/README.md +++ b/code_saturne/README.md @@ -1,17 +1,134 @@ # Code_Saturne -Code_Saturne is open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to account for other physics, such as conjugate heat transfer and structure mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi. +[Code_Saturne](https://www.code-saturne.org/cms/) is an open-source multi-purpose CFD software, primarily developed by EDF R&D and maintained by them. It relies on the Finite Volume method and a collocated arrangement of unknowns to solve the Navier-Stokes equations, for incompressible or compressible flows, laminar or turbulent flows and non-Newtonian and Newtonian fluids. A new discretisation based on the Compatible Discrete Operator (CDO) approach can be used for some physics. A highly parallel coupling library (Parallel Locator Exchange - PLE) is also available in the distribution to couple other software with different physics, such as for conjugate heat transfer and structural mechanics. For the incompressible solver, the pressure is solved using an integrated Algebraic Multi-Grid algorithm and the velocity components/scalars are computed by conjugate gradient methods or Gauss-Seidel/Jacobi. -The original version of the code is written in C for pre-postprocessing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for most of the physics implementation. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to handle potential shared memory. The version used in this work (also freely available) relies on CUDA to take advantage of potential GPU acceleration. +The original version of the code is written in C for pre-/post-processing, IO handling, parallelisation handling, linear solvers and gradient computation, and Fortran 95 for some of the physics-related implementation. Python is used to manage the simulations. MPI is used on distributed memory machines and OpenMP pragmas have been added to the most costly parts of the code to be used on shared memory architectures. The version used in this work relies on external libraries (AMGx - PETSc) to take advantage of potential GPU acceleration. -The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is usually due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the two test cases ([https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/Code_Saturne_Build_Run_5.3_UEABS.pdf](CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz) and [https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/Code_Saturne_Build_Run_5.3_UEABS.pdf](CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz)) chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, with a mesh 8 times larger for CAVITY_111M than for CAVITY_13M, the time step being halved to ensure a correct Courant number. +The equations are solved iteratively using time-marching algorithms, and most of the time spent during a time step is due to the computation of the velocity-pressure coupling, for simple physics. For this reason, the two test cases chosen for the benchmark suite have been designed to assess the velocity-pressure coupling computation, and rely on the same configuration, the 3-D lid-driven cavity, using tetrahedral cell meshes. The first case mesh contains over 13 million cells. The second test case is modular in the sense that mesh multiplication can be used to increase on-the-fly the mesh size. -## Building and running the code is described in the file -[Code_Saturne_Build_Run_5.3_UEABS.pdf](Code_Saturne_Build_Run_5.3_UEABS.pdf) +## Building Code_Saturne v7.0.0 +The version 7.0.0 of Code_Saturne is to be found [here](https://www.code-saturne.org/cms/sites/default/files/releases/code_saturne-7.0.0.tar.gz). -## The test cases are to be found under: -https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_111M.tar.gz -https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS_CAVITY_13M.tar.gz +A simple installer [_InstallHPC.sh_](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/InstallHPC.sh) is made available for this version. -## The distribution is to be found under: -https://repository.prace-ri.eu/ueabs/Code_Saturne/2.1/CS_5.3_PRACE_UEABS.tar.gz +An example of the last lines of the installer (meant for the GNU compiler & MPI-OpenMP in this example) reads:\ + +$KERSRC/configure \\ \ +--disable-shared \\ \ +--disable-nls \\ \ +--without-modules \\ \ +--disable-gui \\ \ +--enable-long-gnum \\ \ +--disable-mei \\ \ +--enable-debug \\ \ +--prefix=$KEROPT \\ \ +CC="mpicc" CFLAGS="-O3" FC="mpif90" FCFLAGS="-O3" CXX="mpicxx" CXXFLAGS="-O3" \ +\# \ +make -j 8 \ +make install + +CC, FC, CFLAGS, FCFLAGS, LDFLAGS and LIBS might have to be tailored for your machine, compilers, MPI installation, etc. +More information concerning the options can be found by typing: ./configure --help + +Assuming that CS_7.0.0_PRACE_UEABS is the current directory, the tarball is untarred in there as: \ +tar zxvf code_saturne-7.0.0.tar.gz + +and the code is then installed as: + +cd CS_7.0.0_PRACE_UEABS \ +./InstallHPC.sh + +If the installation is successful the command **code_saturne** should return, when typing:\ +YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne + +Usage: ./code_saturne + +Topics: \ + help \ + studymanager \ + smgr \ + bdiff \ + bdump \ + compile \ + config \ + cplgui \ + create \ + gui \ + parametric \ + studymanagergui \ + smgrgui \ + trackcvg \ + update \ + up \ + info \ + run \ + submit \ + symbol2line + +Options: \ + -h, --help show this help message and exit + +## Preparing a simulation. +Two archives are used, namely [**CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz**](https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_13M.tar.gz) and [**CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz**](https://repository.prace-ri.eu/ueabs/Code_Saturne/2.2/CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz) that contain the information required to run both test cases, with the mesh_input.csm file (for the mesh) and the usersubroutines in _src_. + +Taking the example of CAVITY_13M, from the working directory WORKDIR (different from CS_7.0.0_PRACE_UEABS), a ‘study’ has to be created (CAVITY_13M, for instance) as well as a ‘case’ (MACHINE, for instance) as: + +YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne create --study CAVITY_13M --case MACHINE --copy-ref + +The directory **CAVITY_13M** contains 3 directories, MACHINE, MESH and POST. + +The directory **MACHINE** contains 3 directories, DATA, RESU and SRC. + +The file mesh_input.csm should be copied into the MESH directory. + +The user subroutines (cs_user* files) contained in _src_ should be copied into SRC. + +The file _cs_user_scripts.py_ is used to manage the simulation. It has to be copied to DATA as: \ +cd DATA \ +cp REFERENCE/cs_user_scripts.py . \ +At Line 89 of this file, you need to change from None to the local path of the mesh, i.e. "../MESH/mesh_input.csm” + +To finalise the preparation go to the folder MACHINE and type: \ +YOUR_PATH/CS_7.0.0_PRACE_UEABS/code_saturne-7.0.0/arch/Linux/bin/code_saturne run --initialize + +This should create a folder RESU/YYYYMMDD-HHMM, which should contain the following flles: +- compile.log +- cs_solver +- cs_user_scripts.py +- listing +- mesh_input.csm +- run.cfg +- run_solver +- setup.xml +- src +- summary + +## Running Code_Saturne v7.0.0 +The name of the executable is ./cs_solver and, the code should be run as mpirun/mpiexec/poe/aprun ./cs_solver + +## Example of timing +A script is used to compute the average time per time step, e.g. [_CS_collect_timing.sh_](https://repository.prace-ri.eu/git/UEABS/ueabs/-/blob/r2.2-dev/code_saturne/CS_collect_timing.sh), which returns: + +Averaged timing for the 97 entries: 2.82014432989690721649 + +for the case of the CAVITY_13M, run on 2 nodes of a Cray - AMD (Rome). + +## Larger cases +The same steps are carried for the larger cases using the CS_7.0.0_PRACE_UEABS_CAVITY_XXXM.tar.gz file. +These cases are built by mesh multiplication (also called global refinement) of the mesh used for CAVITY_13M. +If 1 (resp. 2 or 3) level(s) of refinement is/are used, the mesh is over 111M (resp. 889M or 7112M) cells large. The +third mesh (level 3) is definitely suitable to run using over 100,000 MPI tasks.\ + +To make sure that the simulations are stable, the time step is adjusted depending on the refinement level used. + +The number of levels of refinement is set at Line 152 of the _cs_user_mesh.c_ file, by chosing tot_nb_mm as +1, 2 or 3.\ +The time step is set at Line 248 of the _cs_user_parameter.f90_ file, by chosing 0.01d0 / 3.d0 (level 1), 0.01d0 / 9.d0 +(level 2) or 0.01d0 / 27.d0. \ +The table below recalls the correct settings. + +| | At Line 152 of _cs_user_mesh.c_ | At Line 248 of _cs_user_parameter.f90_ | +| ------ | ------ | ------ | +| Level 1 | tot_nb_mm = 1 | dtref = 0.01d0 / 3.d0 | +| Level 2 | tot_nb_mm = 2 | dtref = 0.01d0 / 9.d0 | +| Level 3 | tot_nb_mm = 3 | dtref = 0.01d0 / 27.d0 | diff --git a/tensorflow/testcase_medium/.gitkeep b/gadget/3.0/.gitkeep similarity index 100% rename from tensorflow/testcase_medium/.gitkeep rename to gadget/3.0/.gitkeep diff --git a/gadget/gadget3_Build_README.txt b/gadget/3.0/gadget3_Build_README.txt similarity index 100% rename from gadget/gadget3_Build_README.txt rename to gadget/3.0/gadget3_Build_README.txt diff --git a/gadget/gadget3_Run_README.txt b/gadget/3.0/gadget3_Run_README.txt similarity index 100% rename from gadget/gadget3_Run_README.txt rename to gadget/3.0/gadget3_Run_README.txt diff --git a/gpaw/.gitignore b/gadget/4.0/.gitkeep similarity index 100% rename from gpaw/.gitignore rename to gadget/4.0/.gitkeep diff --git a/gadget/4.0/README.md b/gadget/4.0/README.md new file mode 100644 index 0000000000000000000000000000000000000000..96ebcaf35b4db5356891f411922b9328be73c476 --- /dev/null +++ b/gadget/4.0/README.md @@ -0,0 +1,225 @@ +# GADGET + + +## Summary Version +4.0 (2021) + +## Purpose of Benchmark +Provide the Astrophysical community information on the performance and scalability (weak and strong scaling) of the Gadget-4 code associated to three test cases in PRACE Tier-0 supercomputers (JUWELS, MareNostrum4, IRENE-SKL, and IRENE-KNL). + +## Characteristics of Benchmark + +GADGET-4 was compiled with C++ with the optimisation level O3, MPI (e.g., OpenMPI, Intel MPI) and the libraries HDF5, GSL, and FFTW3. The tests were carried out using two modes: Intel and GCC compiled MPI API and libraries. In order to have a proper scalability analysis the tests we carried out with one MPI-task per core and 16 tasks per CPU. Hence, a total of 32 tasks (cores) dedicated to the calculations were used per compute node. An extra core to handle the MPI communications was used per compute node. + +## Mechanics of Building Benchmark + +Building the GADGET code requires a compiler with full C++11 support, MPI (e.g., MPICH, OpenMPI, IntelMPI), HDF5, GSL, and FFTW3. Hence, the corresponding environment modules must be loaded, e.g., + +``` +module load OpenMPI/4.0.3 HDF5/1.10.6 FFTW/3.3.8 GSL/2.6 +``` + +### Source Code and Initial Conditions + +##### Source Code Release + +Latest release of the code can be downloaded from [https://gitlab.mpcdf.mpg.de/vrs/gadget4](https://gitlab.mpcdf.mpg.de/vrs/gadget4) + +or clone the repository by + +``` +git clone http://gitlab.mpcdf.mpg.de/vrs/gadget4 +``` + +##### In this UEABS repository you can find: + +- A cloned version of the code (version of June 28, 2021): [gadget4.tar.gz](./gadget/4.0/gadget4.tar.gz) + +This tarball includes the `src` code, `examples`, `buildsystem`, and `documentation` folders. It also includes **Makefile** and **Makefile.systype** (or a template) files. + +- The code used in the benchmarks (version of June 22, 2021):[gadget4-benchmarks.tar.gz](./gadget/4.0/gadget4-benchmarks.tar.gz) + +- Examples initial conditions from [example_ics.tar.gz](./gadget/4.0/example_ics.tar.gz) + +It includes initial conditions for each of the examples. When untarred you generate a folder named `ExampleICs`. + +### Build the Executable + +#### General Building of the Executable + +1. Two files are need from the repository: [gadget4.tar.gz](./gadget/4.0/gadget4.tar.gz) and [example_ics.tar.gz](./gadget/4.0/example_ics.tar.gz) + +2. After decompressing gadget4.tar.gz go to the master folder named `gadget4`. There are two files that need modification: **Makefile.systype** and **Makefile**. + +a) In the **Makefile.systype** select one of the system types by uncommenting the corresponding line or add a line with your system, e.g., +``` +#SYSTYPE="XXX-BBB" +``` +where XXX = system name and BBB = whatever you may want to include here, e.g., impi, openmpi, etc. + +b) In case you uncommented a line corresponding to your system in the **Makefile.systype** then there is nothing to do in the **Makefile**. + +c) In case you added a line, say #SYSTYPE="XXX-BBB", into the **Makefile.systype** then you must modify the **Makefile** by adding the following lines in 'define available Systems' + +``` +ifeq ($(SYSTYPE),"XXX-BBB") +include buildsystem/Makefile.comp.XXX-BBB +include buildsystem/Makefile.path.XXX-BBB +endif +``` + +3. In the folder `buildsystem` make sure you have the **Makefile.comp.XXX** and **Makefile.path.XXX** (XXX = cluster name) set with the proper paths and compilation options, respectively. Either chose the existing files or create new ones that reflect your system paths and compiler. + +4. The folder `examples` has several subfolders of test cases. From one of these subfolders, e.g., `CollidingGalaxiesSFR`, copy **Config.sh** to the master folder. + +5. In the master folder compile the code +``` +make CONFIG=Config.sh EXEC=gadget4-exe +``` +where EXEC is the name of the executable. + +6. Create a folder named `Run_CollidingGalaxies`. Copy **gadget4-exe**, and the files **param.txt** and **TREECOOL** existing in the subfolder `CollidingGalaxiesSFR` to `Run_CollidingGalaxies`. + +7. In the folder `Run_CollidingGalaxies` modify **param.txt** to include the adequate path to the initial conditions file **ics_collision_g4.dat** located in the folder `ExampleICs` and modify the memory per core to that of the system you are using. + +8. Run the code using mpirun or submit a SLURM script. + + +#### Building a Test Case Executable | Case A + +1. Download and untar a test case tarball, e.g., [gadget4-case-A.tar.gz](./gadget/4.0/gadget4-case-A.tar.gz) (see below) and the source code used in the benchmarks named [gadget4-benchmarks.tar.gz](./gadget/4.0/gadget4-benchmarks.tar.gz). The folder `gadget4-case-A` has the **Config.sh**, **ics_collision_g4.dat**, **param.txt**, **TREECOOL**, **slurm_script.sh**, and **README** files. + +The **param.txt** file has the path for the initial conditions and was adapted for a system with 2.0 GB RAM per core, in effect 1.8 GB. + +The **README** file describes the setup and run of the code in a supercomputer. Use this as an example of what to expect in other machines. + +2. Change to the folder named `gadget4-benchmarks` and adapt the file **Makefile.systype** and **Makefile** to your needs. Follow instructions 2a), 2b) or 2c) in Section "General Building of the Executable". + +3. Compile the code using the **Config.sh** file in `gadget4-case-A` + +``` +make CONFIG=../gadget4-case-A/Config.sh EXEC=../gadget4-case-A/gadget4-exe +``` + +4. Change to folder `gadget4-case-A` and make sure that the file **param.txt** has the correct memory size per core for the system you are using. + +5. Run the code directly with mpirun or submit a SLURM script. + + +### Mechanics of Running Benchmark + +The general way to run the benchmarks, assuming SLURM Resource/Batch Manager is: + +1. Set the environment modules (see Build the Executable section) + +2. In the folder of the test cases, e.g., `gadget4-case-A`, adapt the SLURM script and submit it + +``` +sbatch slurm_script.sh +``` +where the slurm_script.sh has the form (for a run with 1024 cores): + +``` +#!/bin/bash -l +#SBATCH --time=01:00:00 +#SBATCH --account=ACCOUNT +#SBATCH --job-name=collgal-01024 +#SBATCH --output=g_collgal_%j.out +#SBATCH --error=g_collgal_%j.error +#SBATCH --nodes=32 +#SBATCH --cpus-per-task=1 +#SBATCH --ntasks-per-socket=17 +#SBATCH --ntasks-per-node=33 +#SBATCH --exclusive +#SBATCH --partition=batch + +echo +echo "Running on hosts: $SLURM_NODELIST" +echo "Running on $SLURM_NNODES nodes." +echo "Running on $SLURM_NPROCS processors." +echo "Current working directory is `pwd`" +echo + +srun ./gadget4-exe param.txt +``` +Where: +* gadget4-exe is the executable. +* param.txt is the input parameter file. + +##### NOTE + +Gadget-4 uses one core per compute node to handle communications. Hence, when allocating compute +nodes we must take into account an extra core. So, if we want to run the code with 16 mpi tasks/socket we +must allocate 33 cores per compute node. For a run with 1024 cores in 32 nodes we allocate 1056 cores. + +##### OUTPUT of a run with 1024 cores + +``` +Running on hosts: jwc03n[082-097,169-184] +Running on 32 nodes. +Running on 1056 processors. +Current working directory is Test-Case-A + +Shared memory islands host a minimum of 33 and a maximum of 33 MPI ranks. +We shall use 32 MPI ranks in total for assisting one-sided communication (1 per shared memory node). + + ___ __ ____ ___ ____ ____ __ + / __) /__\ ( _ \ / __)( ___)(_ _)___ /. | +( (_-. /(__)\ )(_) )( (_-. )__) )( (___)(_ _) + \___/(__)(__)(____/ \___/(____) (__) (_) + +This is Gadget, version 4.0. +Git commit 8ee7f358cf43a37955018f64404db191798a32a3, Tue Jun 15 15:10:36 2021 +0200 + +Code was compiled with the following compiler and flags: +... + +Code was compiled with the following settings: + COOLING + DOUBLEPRECISION=1 + GADGET2_HEADER + MULTIPOLE_ORDER=3 + NSOFTCLASSES=2 + NTYPES=6 + POSITIONS_IN_64BIT + SELFGRAVITY + STARFORMATION + TREE_NUM_BEFORE_NODESPLIT=4 + + +Running on 1024 MPI tasks. +``` + +### UEABS Benchmarks + +**A) `Colliding galaxies with star formation`** + +This simulation with setup in the folder CollidingGalaxiesSFR considers the collision of two compound galaxies made up of a dark matter halo, a stellar disk and bulge, and cold gas in the disk that undergoes star formation. Radiative cooling due to helium and hydrogen is included. Star formation and feedback is modelled with a simple subgrid treatment. + +[Download test Case A](./gadget/4.0/gadget4-caseA.tar.gz) + + +**B) `Cosmological DM-only simulation with IC creation`** + +The setup in DM-L50-N128 simulates a small box of comoving side-length 50 Mpc/h using 128^3 dark matter particles. The initial conditions are created on the fly upon start-up of the code, using second order Lagrangian perturbation theory with a starting redshift of z=63. The LEAN option and 32-bit arithmetic are enabled to minimize memory consumption of the code. + +Gravity is computed with the TreePM algorithm at expansion order p=3. Three output times are defined, for which FOF group finding is enabled, and power spectra are computed as well for the snapshots that are produced. Also, the code is asked to compute a power spectrum for each output. + +[Download test Case B](./gadget/4.0/gadget4-caseB.tar.gz) + + +**C) `Adiabatic collapse of a gas sphere`** + +This simulation in G2-gassphere considers the gravitational collapse of a self-gravitating sphere of gas which initially has a 1/r density profile and a very low temperature. The gas falls under its own weight to the centre, where it bounces back and a strong shock wave that moves outwards develops. The simulation uses Newtonian physics in a natural system of units (G=1). + +[Download test Case C](./gadget/4.0/gadget4-caseC.tar.gz) + + +## Performance +GADGET reports in log file both time and performance. + +** `Performance` in `ns/day` units : `grep Performance logfile | awk -F ' ' '{print $2}'`. ** + +** `Execution Time` in `seconds` : `grep Time: logfile | awk -F ' ' '{print $3}'`** + + diff --git a/gadget/4.0/example_ics.tar.gz b/gadget/4.0/example_ics.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..6431cb4c59972ff773cccbaefc6d5d2677893fef Binary files /dev/null and b/gadget/4.0/example_ics.tar.gz differ diff --git a/gadget/4.0/gadget4-benchmarks.tar.gz b/gadget/4.0/gadget4-benchmarks.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..ea67887b05b52e6bd5cda6396ed82068384f62d3 Binary files /dev/null and b/gadget/4.0/gadget4-benchmarks.tar.gz differ diff --git a/gadget/4.0/gadget4-case-A.tar.gz b/gadget/4.0/gadget4-case-A.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..15a3041b8790b552ec46a141aa82aa3c6428a93b Binary files /dev/null and b/gadget/4.0/gadget4-case-A.tar.gz differ diff --git a/gadget/4.0/gadget4.tar.gz b/gadget/4.0/gadget4.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..8b137891791fe96927ad78e64b0aad7bded08bdc --- /dev/null +++ b/gadget/4.0/gadget4.tar.gz @@ -0,0 +1 @@ + diff --git a/gpaw/README.md b/gpaw/README.md index 464e12b7158da655856183eaac6f8529f3941736..f39b861188e2b2f20f8fc8be5dced89534c8de75 100644 --- a/gpaw/README.md +++ b/gpaw/README.md @@ -2,7 +2,7 @@ ## Summary version -0.9 +1.0 ## Purpose of the benchmark @@ -28,10 +28,6 @@ There have been various developments for GPGPUs and MICs in the past using either CUDA or pyMIC/libxstream. Many of those branches see no development anymore. The relevant CUDA version for this benchmark is available in a [separate GitLab for CUDA development, cuda branch](https://gitlab.com/mlouhivu/gpaw/tree/cuda). -This version corresponds to the Aalto version mentioned on the -[GPU page of the GPAW Wiki](https://wiki.fysik.dtu.dk/gpaw/devel/projects/gpu.html). -As of early 2020, that version seems to be derived from the 1.5.2 CPU version -(at least, I could find a commit that claims to merge the 1.5.2 code). There is currently no active support for non-CUDA accelerator platforms. @@ -40,7 +36,7 @@ For the UEABS benchmark version 2.2, the following versions of GPAW were tested: * Version 20.1.0, as this one is the version on which the most recent GPU commits are based. * Version 20.10.0, as it was the most recent version during the development of - the IEABS 2.2. benchmark suite. + the UEABS 2.2 benchmark suite. * GPU-based: As there is no official release of the GPU version and as it is at the moment of the release of the UEABS version 2.2 under heavy development to also support AMD GPUs, there is no official support for the GPU version @@ -104,7 +100,7 @@ these versions of GPAW. Installing and running GPAW has changed a lot in the since the previous versions of the UEABS. GPAW version numbering changed in 2019. Version 1.5.3 is the last version with the old numbering. In 2019 the development team switched -to a version numbering scheme based on year, month and patchlevel, e.g., +to a version numbering scheme based on year, month and patch level, e.g., 19.8.1 for the second version released in August 2019. Another change is in the Python packages used to install GPAW. Versions up to @@ -120,7 +116,7 @@ interpreter is used and the MPI functionality is included in the `_gpaw.so` shar ### Available instructions The [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/) only contains the -[installation instructions](https://wiki.fysik.dtu.dk/gpaw/index.html) for the current version. +[installation instructions](https://wiki.fysik.dtu.dk/gpaw/install.html) for the current version. For the installation instructions with a list of dependencies for older versions, download the code (see below) and look for the file `doc/install.rst` or go to the [GPAW GitLab](https://gitlab.com/gpaw), select the tag for the desired version and @@ -173,9 +169,9 @@ In addition, the GPU version needs: Installing GPAW also requires a number of standard build tools on the system, including * [GNU autoconf](https://www.gnu.org/software/autoconf/) is needed to generate the - configure script for libxc + configure script for LibXC * [GNU Libtool](https://www.gnu.org/software/libtool/) is needed. If not found, - the configure process of libxc produces very misleading + the configure process of LibXC produces very misleading error messages that do not immediately point to libtool missing. * [GNU make](https://www.gnu.org/software/make/) @@ -209,7 +205,7 @@ may not offer optimal performance and the automatic detection of the libraries a fails on some systems. The UEABS repository contains additional instructions: - * [general instructions](build/build-cpu.md) + * [general instructions](build/build-CPU.md) Example [build scripts](build/examples/) are also available. @@ -301,11 +297,17 @@ on one hand and version 21.1.0 on the other hand. The expected values are: * Number of iterations: Between 30 and 35 - * Dipole (3rd component): - * 20.1.0 and 20.10.0: between -0.493 and -0.491 - * 21.1.0: between -0.462 and -0.461 - * Fermi level: - * 20.1.0 and 20.10.0: between -2.67 and -2.66 - * 21.1.0: between -2.59 and -2.58 + * Dipole (3rd component): Between -0.493 and -0.491 + * Fermi level: Between -2.67 and -2.66 + * Extrapolated energy: Between -3784 and -3783 + +Note: Though not used for the benchmarking in the final report, some testing was done +with version 21.1.0 also. In this version, some external library routines were replaced +by new internal implementations that cause changes in some results. For 21.1.0, the +expected values are: + + * Number of iterations: Between 30 and 35 + * Dipole (3rd component): Between -0.462 and -0.461 + * Fermi level: Between -2.59 and -2.58 * Extrapolated energy: Between -3784 and -3783 diff --git a/gpaw/build/build-CPU.md b/gpaw/build/build-CPU.md index 3ae6e7a94c528a637ebb9488f41f8961d7eab3b9..a71399ebad6028c77fd3b27475d31e74595cbd62 100644 --- a/gpaw/build/build-CPU.md +++ b/gpaw/build/build-CPU.md @@ -1,4 +1,4 @@ -# Detailed GPAW installation instructions on non-acclerated systems +# Detailed GPAW installation instructions on non-accelerated systems These instructions are in addition to the brief instructions in [README.md](../README.md). @@ -13,7 +13,7 @@ GPAW needs (for the UEABS benchmarks) * [Python](https://www.python.org/): GPAW 20.1.0 requires Python 3.5-3.8, and GPAW 20.10.0 and 21.1.0 require Python 3.6-3.9. * [MPI library](https://www.mpi-forum.org/) - * [LibXC](https://www.tddft.org/programs/libxc/). GPAW 20.1.0, + * [LibXC](https://www.tddft.org/programs/LibXC/). GPAW 20.1.0, 20.10.0 and 21.1.0 all need LibXC 3.x or 4.x. * (Optimized) [BLAS](http://www.netlib.org/blas/) and [LAPACK](http://www.netlib.org/lapack/) libraries. @@ -22,7 +22,7 @@ GPAW needs (for the UEABS benchmarks) will give very poor performance. Most optimized LAPACK libraries actually only optimize a few critical routines while the remaining routines are compiled from the reference version. Most processor vendors for HPC machines and system vendors - offer optmized versions of these libraries. + offer optimized versions of these libraries. * [ScaLAPACK](http://www.netlib.org/scalapack/) and the underlying communication layer [BLACS](http://www.netlib.org/blacs/). * [FFTW](http://www.fftw.org/) or compatible FFT library. @@ -60,7 +60,7 @@ GPAW needs * [ASE, Atomic Simulation Environment](https://wiki.fysik.dtu.dk/ase/), a Python package from the same group that develops GPAW. The required versions is 3.18.0 or later for GPAW 20.1.0, 20.10.0 and 21.1.0. - ASE has a couple of dependendencies + ASE has a couple of dependencies that are not needed for running the UEABS benchmarks. However, several Python package install methods will trigger the installation of those packages, and with them may require a chain of system libraries. @@ -69,7 +69,7 @@ GPAW needs This package is optional and not really needed to run the benchmarks. Matplotlib pulls in a lot of other dependencies. When installing ASE with pip, it will try to pull in matplotlib and its dependencies - * [pillow](https://pypi.org/project/Pillow/) needs several exgternal + * [pillow](https://pypi.org/project/Pillow/) needs several external libraries. During the development of the benchmarks, we needed at least zlib, libjpeg-turbo (or compatible libjpeg library) and freetype. Even though the pillow documentation claimed that libjpeg was optional, @@ -90,7 +90,7 @@ GPAW needs code * [itsdangerous](https://pypi.org/project/itsdangerous/) * [Werkzeug](https://pypi.org/project/Werkzeug/) - * [click]() + * [click](https://pypi.org/project/click/) ## Tested configurations @@ -126,6 +126,14 @@ GPAW that were tested: |:--------|:--------|:-------|:-------|:------| | 20.1.0 | 3.19.3 | 3.8.7 | 1.18.5 | 1.5.4 | | 20.10.0 | 3.20.1 | 3.9.4 | 1.19.5 | 1.5.4 | + +Note: On some systems compiling SciPy 1.5.4 with NumPy 1.19.5 produced errors. On those +systems NumPy 1.18.5 was used. + +Other configurations that were only tested on a limited number of clusters: + +| GPAW | ASE | Python | NumPy | SciPy | +|:--------|:--------|:-------|:-------|:------| | 21.1.0 | 3.21.1 | 3.9.4 | 1.19.5 | 1.5.4 | @@ -137,33 +145,31 @@ Also, the instructions below will need to be adapted to the specific libraries that are being used. Other prerequisites: - * libxc + * LibXC * Python interpreter * Python package NumPy * Python package SciPy * Python package ase -### Installing libxc +### Installing LibXC - * Installing libxc requires GNU automake and GNU buildtool besides GNU make and a + * Installing LibXC requires GNU automake and GNU buildtool besides GNU make and a C compiler. The build process is the usual GNU configure - make - make install cycle, but the `configure` script still needs to be generated with autoreconf. - * Download libxc: - * The latest version of libxc can be downloaded from - [the libxc download page](https://www.tddft.org/programs/libxc/download/). + * Download LibXC: + * The latest version of LibXC can be downloaded from + [the LibXC download page](https://www.tddft.org/programs/libxc/download/). However, that version may not be officially supported by GPAW. - * It is also possible to download all recent versions of libxc from - [the libxc GitLab](https://gitlab.com/libxc/libxc) + * It is also possible to download all recent versions of LibXC from + [the LibXC GitLab](https://gitlab.com/libxc/libxc) * Select the tag corresponding to the version you want to download in the branch/tag selection box. * Then use the download button and select the desired file type. - * Dowload URLs look like `https://gitlab.com/libxc/libxc/-/archive/4.3.4/libxc-4.3.4.tar.bz2`. + * Download URLs look like `https://gitlab.com/libxc/libxc/-/archive/4.3.4/libxc-4.3.4.tar.bz2`. * Untar the file in the build directory. - - ### Installing Python from scratch The easiest way to get Python on your system is to download an existing distribution @@ -222,8 +228,8 @@ of NumPy, SciPy and GPAW itself proves much more important. use the NumPy FFT routines. * GPAW also needs a number of so-called "Atomic PAW Setup" files. The latest files can be found on the [GPAW website, Atomic PAW Setups page](https://wiki.fysik.dtu.dk/gpaw/setups/setups.html). - For the testing we used []`gpaw-setups-0.9.20000.tar.gz`](https://wiki.fysik.dtu.dk/gpaw-files/gpaw-setups-0.9.20000.tar.gz) - for all versions of GPAW. The easiest way to install these files is to simpy untar + For the testing we used [`gpaw-setups-0.9.20000.tar.gz`](https://wiki.fysik.dtu.dk/gpaw-files/gpaw-setups-0.9.20000.tar.gz) + for all versions of GPAW. The easiest way to install these files is to simply untar the file and set the environment variable GPAW_SETUP_PATH to point to that directory. In the examples provided we use the `share/gpaw-setups` subdirectory of the install directory for this purpose. diff --git a/gpaw/build/build-cuda.md b/gpaw/build/build-cuda.md index b12f1e185874c039563edf42e46d4ac7974dbec3..2bfc06b4ac11d3d0a39b28a59c01390a4cbf3c37 100644 --- a/gpaw/build/build-cuda.md +++ b/gpaw/build/build-cuda.md @@ -1,6 +1,11 @@ GPGPUs ====== +**These instructions were not updated for UEABS release 2.2 as the GPU version +was in full redevelopment at that time to also work for the AMD GPUs used in the +LUMI pre-exascale system. They may or may not be useful anymore for the new +versions once they are finished.** + GPAW has a separate CUDA version available for NVIDIA GPGPUs. Source code is available in GPAW's repository as a separate branch called 'cuda'. An up-to-date development version is currently available at diff --git a/gpaw/build/build-xeon-phi.md b/gpaw/build/build-xeon-phi.md deleted file mode 100644 index 2c6a2cee09b9a24a0ab04628bfbf4e0b636e7398..0000000000000000000000000000000000000000 --- a/gpaw/build/build-xeon-phi.md +++ /dev/null @@ -1,109 +0,0 @@ -Xeon Phi MICs -============= - -Intel's MIC architecture includes two distinct generations of processors: -1st generation Knights Corner (KNC) and 2nd generation Knights Landing (KNL). -KNCs require a specific offload version of GPAW, whereas KNLs use standard -GPAW. - - -KNC (Knights Corner) --------------------- - -For KNCs, GPAW has adopted an offload-to-the-MIC-coprocessor approach similar -to GPGPUs. The offload version of GPAW uses the stream-based offload module -pyMIC (https://github.com/01org/pyMIC) to offload computationally intensive -matrix calculations to the MIC co-processors. - -Source code is available in GPAW's repository as a separate branch called -'mic'. To obtain the code, use e.g. the following commands: -``` -git clone https://gitlab.com/gpaw/gpaw.git -cd gpaw -git checkout mic -``` -or download it from: https://gitlab.com/gpaw/gpaw/tree/mic - -A ready-to-use install package with examples and instructions is also -available at: - https://github.com/mlouhivu/gpaw-mic-install-pack.git - - -### Software requirements - -The offload version of GPAW is roughly equivalent to the 0.11.0 version of -GPAW and thus has similar requirements (for software and versions). - -For example, the following versions are known to work: -* Python (2.7.x) -* ASE (3.9.1) -* NumPy (1.9.2) -* Libxc (2.1.x) - -In addition, pyMIC requires: -* Intel compile environment with Intel MKL and Intel MPI -* Intel MPSS (Manycore Platform Software Stack) - -### Install instructions - -In addition to using Intel compilers, there are three additional steps apart -from standard installation: - -1. Compile and install Numpy with a suitable `site.cfg` to use MKL, e.g. - - ``` - [mkl] - library_dirs = /path/to/mkl/lib/intel64 - include_dirs = /path/to/mkl/include - lapack_libs = - mkl_libs = mkl_rt - ``` - -2. Compile and install [pyMIC](https://github.com/01org/pyMIC) before GPAW. - -3. Edit your GPAW setup script (`customize.py`) to add correct link and - compile options for offloading. The relevant lines are e.g.: - - ```python - # offload to KNC - extra_compile_args += ['-qoffload-option,mic,compiler,"-qopenmp"'] - extra_compile_args += ['-qopt-report-phase=offload'] - - # linker settings for MKL on KNC - mic_mkl_lib = '/path/to/mkl/lib/mic/' - extra_link_args += ['-offload-option,mic,link,"-L' + mic_mkl_lib \ - + ' -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread"'] - ``` - - -KNL (Knights Landing) ---------------------- - -For KNLs, one can use the standard version of GPAW, instead of the offload -version used for KNCs. Please refer to the generic installation instructions -for GPAW. - -### Software requirements - - https://wiki.fysik.dtu.dk/gpaw/install.html - -### Install instructions - - https://wiki.fysik.dtu.dk/gpaw/install.html - - https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html - -It is advisable to use Intel compile environment with Intel MKL and Intel MPI -to take advantage of their KNL optimisations. To enable the AVX512 vector -sets supported by KNLs, one needs to use the compiler option `-xMIC-AVX512` -when installing GPAW. - -To improve performance, one may also link to Intel TBB to benefit from an -optimised memory allocator (tbbmalloc). This can be done during installation -or at run-time by setting environment variable LD_PRELOAD to point to the -correct libraries, i.e. for example: -```bash -export LD_PRELOAD=$TBBROOT/lib/intel64/gcc4.7/libtbbmalloc_proxy.so.2 -export LD_PRELOAD=$LD_PRELOAD:$TBBROOT/lib/intel64/gcc4.7/libtbbmalloc.so.2 -``` - -It may also be beneficial to use hugepages together with tbbmalloc -(`export TBB_MALLOC_USE_HUGE_PAGES=1`). - diff --git a/gpaw/build/examples/BSC-MareNostrum4-skylake/build_20.1.0_Python38_FFTW_icc.BSC.sh b/gpaw/build/examples/BSC-MareNostrum4-skylake/build_20.1.0_Python38_FFTW_icc.BSC.sh new file mode 100755 index 0000000000000000000000000000000000000000..4d42553486fad8860d6b34c077092af0ce522616 --- /dev/null +++ b/gpaw/build/examples/BSC-MareNostrum4-skylake/build_20.1.0_Python38_FFTW_icc.BSC.sh @@ -0,0 +1,1072 @@ +#!/bin/bash +# +# Installation script for GPAW 20.1.0: +# * We compile our own Python as this is the best guarantee to not have to +# struggle with compatibility problems between various compilers used for +# various components +# * Using the matching version of ase, 3.19.3 +# * Compiling with the Intel compilers +# +# The FFT library is discovered at runtime. With the settings used in this script +# this should be MKL FFT, but it is possible to change this at runtime to either +# MKL, FFTW or the built-in NumPy FFT routines, see the installation instructions +# (link below). +# +# The original installation instructions for GPAW can be found at +# https://gitlab.com/gpaw/gpaw/-/blob/20.1.0/doc/install.rst +# + +packageID='20.1.0-Python38-FFTW-icc' +packageName='GPAW-UEABS' + +echo -e "\n###### Building $packageName/$packageID from $0\n\n" + +# The next three variables are only used to load the right UEABS module +# and to give example values for variable settings in comments. +install_root=$PROJECT/UEABS +systemID=BSC-MareNostrum4-skylake + +module purge +MODULEPATH=$install_root/$systemID/Modules:$MODULEPATH + +# +# The following UEABS_ variables are needed: +# We set them manually as we have no UEABS module for this system at the moment. +# +# Directory to put the downloaded sources of the packages. +UEABS_DOWNLOADS=$install_root/SOURCES +# Directory where packages should be installed. +UEABS_PACKAGES=$install_root/$systemID/Packages +# Directory where modules are installed +UEABS_MODULES=$install_root/$systemID/Modules + +install_dir=$UEABS_PACKAGES/$packageName/$packageID +modules_dir=$UEABS_MODULES/$packageName +#build_dir="/dev/shm/$USER/$packageName/$packageID" +build_dir="$SCRATCH/UEABS-tmp/$packageName/$packageID" + +# Software versions +python_version='3.8.7' +zlib_version='1.2.11' +ncurses_version='6.2' +readline_version='8.0' +sqlite_version='3.33.0' +sqlite_download='3330000' +libffi_version='3.3' +fftw_version='3.3.8' +libxc_version='4.3.4' + +setuptools_version='56.0.0' +setuptoolsscm_version='6.0.1' +wheel_version='0.35.1' + +attrs_version='20.3.0' +pybind11_version='2.6.2' +cython_version='0.29.21' + +py_version='1.10.0' +pyparsing_version='2.4.7' +toml_version='0.10.2' +iniconfig_version='1.1.1' +packaging_version='20.9' +pytest_version='6.2.3' + +numpy_version='1.18.5' +scipy_version='1.5.4' +ase_version='3.19.3' +GPAW_version='20.1.0' + +GPAWsetups_version='0.9.20000' # Check version on https://wiki.fysik.dtu.dk/gpaw/setups/setups.html + +# Compiler settings +#compiler_module='intel/2020.1' +#mpi_module='impi/2018.4' +#math_module='mkl/2020.1' +compiler_module='intel/2018.4' +mpi_module='impi/2018.4' +math_module='mkl/2018.4' +opt_level='-O2' +proc_opt_flags='-xHost' +fp_opt_flags='-ftz -fp-speculation=safe -fp-model source' +parallel=16 + +py_maj_min='3.8' + +################################################################################ +# +# Prepare the system +# + +# +# Load modules +# +mkdir -p $modules_dir +module load $compiler_module +module load $mpi_module +module load $math_module + +# +# Create the directories and make sure they are clean if that matters +# +/usr/bin/mkdir -p $UEABS_DOWNLOADS + +/usr/bin/mkdir -p $install_dir +/usr/bin/rm -rf $install_dir +/usr/bin/mkdir -p $install_dir + +/usr/bin/mkdir -p $modules_dir + +/usr/bin/mkdir -p $build_dir +/usr/bin/rm -rf $build_dir +/usr/bin/mkdir -p $build_dir + + +################################################################################ +# +# Download components +# + +echo -e "\n### Downloading files...\n" + +function wget() { + + echo "Please download $1 to $UEABS_DOWNLOADS" + +} + +cd $UEABS_DOWNLOADS + +downloads_OK=1 + +# zlib: https://www.zlib.net/zlib-1.2.11.tar.gz +zlib_file="zlib-$zlib_version.tar.gz" +zlib_url="https://www.zlib.net" +[[ -f $zlib_file ]] || wget "$zlib_url/$zlib_file" +[[ -f $zlib_file ]] || downloads_OK=0 + +# ncurses: https://ftp.gnu.org/pub/gnu/ncurses/ncurses-6.2.tar.gz +ncurses_file="ncurses-$ncurses_version.tar.gz" +ncurses_url="https://ftp.gnu.org/pub/gnu/ncurses" +[[ -f $ncurses_file ]] || wget "$ncurses_url/$ncurses_file" +[[ -f $ncurses_file ]] || downloads_OK=0 + +# readline: https://ftp.gnu.org/pub/gnu/readline/readline-8.0.tar.gz +readline_file="readline-$readline_version.tar.gz" +readline_url="https://ftp.gnu.org/pub/gnu/readline" +[[ -f $readline_file ]] || wget "$readline_url/$readline_file" +[[ -f $readline_file ]] || downloads_OK=0 + +# sqlite: https://www.sqlite.org/2020/sqlite-autoconf-3330000.tar.gz +sqlite_file="sqlite-autoconf-$sqlite_download.tar.gz" +sqlite_url="https://www.sqlite.org/2020" +[[ -f $sqlite_file ]] || wget "$sqlite_url/$sqlite_file" +[[ -f $sqlite_file ]] || downloads_OK=0 + +# libffi: https://github.com/libffi/libffi/releases/download/v3.3/libffi-3.3.tar.gz +libffi_file="libffi-$libffi_version.tar.gz" +libffi_url="https://github.com/libffi/libffi/releases/download/v$libffi_version" +[[ -f $libffi_file ]] || wget "$libffi_url/$libffi_file" +[[ -f $libffi_file ]] || downloads_OK=0 + +# FFTW: http://www.fftw.org/fftw-3.3.8.tar.gz +fftw_file="fftw-$fftw_version.tar.gz" +fftw_url="http://www.fftw.org" +[[ -f $fftw_file ]] || wget "$fftw_url/$fftw_file" +[[ -f $fftw_file ]] || downloads_OK=0 + +# https://gitlab.com/libxc/libxc/-/archive/4.3.4/libxc-4.3.4.tar.bz2 +libxc_file="libxc-$libxc_version.tar.bz2" +libxc_url="https://gitlab.com/libxc/libxc/-/archive/$libxc_version" +[[ -f $libxc_file ]] || wget "$libxc_url/$libxc_file" +[[ -f $libxc_file ]] || downloads_OK=0 + +# Python: https://www.python.org/ftp/python/3.7.9/Python-3.7.9.tar.xz +python_file="Python-$python_version.tar.xz" +python_url="https://www.python.org/ftp/python/$python_version" +[[ -f $python_file ]] || wget "$python_url/$python_file" +[[ -f $python_file ]] || downloads_OK=0 + +# Downloading setuptools. +setuptools_file="setuptools-$setuptools_version.tar.gz" +setuptools_url="https://pypi.python.org/packages/source/s/setuptools" +[[ -f $setuptools_file ]] || wget $setuptools_url/$setuptools_file +[[ -f $setuptools_file ]] || downloads_OK=0 + +# Downloading setuptoolssscm so that we can gather all sources for reproducibility. +setuptoolsscm_file="setuptools_scm-$setuptoolsscm_version.tar.gz" +setuptoolsscm_url="https://pypi.python.org/packages/source/s/setuptools_scm" +[[ -f $setuptoolsscm_file ]] || wget $setuptoolsscm_url/$setuptoolsscm_file +[[ -f $setuptoolsscm_file ]] || downloads_OK=0 + +# Downloading wheel so that we can gather all sources for reproducibility. +wheel_file="wheel-$wheel_version.tar.gz" +wheel_url="https://pypi.python.org/packages/source/w/wheel" +[[ -f $wheel_file ]] || wget "$wheel_url/$wheel_file" +[[ -f $wheel_file ]] || downloads_OK=0 + +# Downloading attrs so that we can gather all sources for reproducibility. +attrs_file="attrs-$attrs_version.tar.gz" +attrs_url="https://pypi.python.org/packages/source/a/attrs" +[[ -f $attrs_file ]] || wget $attrs_url/$attrs_file +[[ -f $attrs_file ]] || downloads_OK=0 + +# Downloading pybind11 so that we can gather all sources for reproducibility. +pybind11_file="pybind11-$pybind11_version.tar.gz" +pybind11_url="https://pypi.python.org/packages/source/p/pybind11" +[[ -f $pybind11_file ]] || wget $pybind11_url/$pybind11_file +[[ -f $pybind11_file ]] || downloads_OK=0 + +# Downloading Cython so that we can gather all sources for reproducibility. +cython_file="Cython-$cython_version.tar.gz" +cython_url="https://pypi.python.org/packages/source/c/cython" +[[ -f $cython_file ]] || wget "$cython_url/$cython_file" +[[ -f $cython_file ]] || downloads_OK=0 + +# Downloading py so that we can gather all sources for reproducibility. +py_file="py-$py_version.tar.gz" +py_url="https://pypi.python.org/packages/source/p/py" +[[ -f $py_file ]] || wget $py_url/$py_file +[[ -f $py_file ]] || downloads_OK=0 + +# Downloading pyparsing so that we can gather all sources for reproducibility. +pyparsing_file="pyparsing-$pyparsing_version.tar.gz" +pyparsing_url="https://pypi.python.org/packages/source/p/pyparsing" +[[ -f $pyparsing_file ]] || wget $pyparsing_url/$pyparsing_file +[[ -f $pyparsing_file ]] || downloads_OK=0 + +# Downloading toml so that we can gather all sources for reproducibility. +toml_file="toml-$toml_version.tar.gz" +toml_url="https://pypi.python.org/packages/source/t/toml" +[[ -f $toml_file ]] || wget $toml_url/$toml_file +[[ -f $toml_file ]] || downloads_OK=0 + +# Downloading iniconfig so that we can gather all sources for reproducibility. +iniconfig_file="iniconfig-$iniconfig_version.tar.gz" +iniconfig_url="https://pypi.python.org/packages/source/i/iniconfig" +[[ -f $iniconfig_file ]] || wget $iniconfig_url/$iniconfig_file +[[ -f $iniconfig_file ]] || downloads_OK=0 + +# Downloading packaging so that we can gather all sources for reproducibility. +packaging_file="packaging-$packaging_version.tar.gz" +packaging_url="https://pypi.python.org/packages/source/p/packaging" +[[ -f $packaging_file ]] || wget $packaging_url/$packaging_file +[[ -f $packaging_file ]] || downloads_OK=0 + +# Downloading pytest so that we can gather all sources for reproducibility. +pytest_file="pytest-$pytest_version.tar.gz" +pytest_url="https://pypi.python.org/packages/source/p/pytest" +[[ -f $pytest_file ]] || wget $pytest_url/$pytest_file +[[ -f $pytest_file ]] || downloads_OK=0 + +# NumPy needs customizations, so we need to download and unpack the sources +numpy_file="numpy-$numpy_version.zip" +numpy_url="https://pypi.python.org/packages/source/n/numpy" +[[ -f $numpy_file ]] || wget "$numpy_url/$numpy_file" +[[ -f $numpy_file ]] || downloads_OK=0 + +# SciPy +scipy_file="scipy-$scipy_version.tar.gz" +scipy_url="https://pypi.python.org/packages/source/s/scipy" +[[ -f $scipy_file ]] || wget "$scipy_url/$scipy_file" +[[ -f $scipy_file ]] || downloads_OK=0 + +# Downloading ase so that we can gather all sources for reproducibility +ase_file="ase-$ase_version.tar.gz" +ase_url="https://pypi.python.org/packages/source/a/ase" +[[ -f $ase_file ]] || wget "$ase_url/$ase_file" +[[ -f $ase_file ]] || downloads_OK=0 + +# GPAW needs customization, so we need to download and unpack the sources. +GPAW_file="gpaw-$GPAW_version.tar.gz" +GPAW_url="https://pypi.python.org/packages/source/g/gpaw" +[[ -f $GPAW_file ]] || wget "$GPAW_url/$GPAW_file" +[[ -f $GPAW_file ]] || downloads_OK=0 + +# Download GPAW-setup, a number of setup files for GPAW. +# https://wiki.fysik.dtu.dk/gpaw-files/gpaw-setups-0.9.20000.tar.gz +GPAWsetups_file="gpaw-setups-$GPAWsetups_version.tar.gz" +GPAWsetups_url="https://wiki.fysik.dtu.dk/gpaw-files" +[[ -f $GPAWsetups_file ]] || wget "$GPAWsetups_url/$GPAWsetups_file" +[[ -f $GPAWsetups_file ]] || downloads_OK=0 + +[[ $downloads_OK ]] || exit + + +################################################################################ +# +# Set PATH-style variables +# +/usr/bin/mkdir -p $install_dir/bin +PATH="$install_dir/bin:$PATH" +/usr/bin/mkdir -p $install_dir/lib +LD_LIBRARY_PATH="$install_dir/lib:$LD_LIBRARY_PATH" +export LIBRARY_PATH="$LD_LIBRARY_PATH" + + +################################################################################ +# +# Install ncurses +# +# We mirror the two-step EasyBuild install process. This may be overkill, but +# we know it works. +# + +echo -e "\n### Installing ncurses...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$ncurses_file + +cd ncurses-$ncurses_version + +# Configure step 1 +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export CXX=icpc +export CXXFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --with-shared --enable-overwrite --without-ada --enable-symlinks + +# Build step 1 +make -j $parallel + +# Install step 1 +make install + +# Add bin, lib and include to the PATH variables +PATH=$install_dir/bin:$PATH +LIBRARY_PATH=$install_dir/lib:$LIBRARY_PATH +LD_LIBRARY_PATH=$install_dir/lib:$LD_LIBRARY_PATH +CPATH=$install_dir/include:$CPATH + +# Configure step 2 +make distclean +./configure --prefix="$install_dir" \ + --with-shared --enable-overwrite --without-ada --enable-symlinks \ + --enable-ext-colors --enable-widec \ + --includedir=$install_dir/include/ncursesw/ + +# Build step 2 +make -j $parallel + +# Install step 2 +make install + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset CXXFLAGS +echo -e "\n### Finishing ncurses installation...\n" + + +################################################################################ +# +# Install readline +# + +echo -e "\n### Installing readline...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$readline_file + +cd readline-$readline_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export CPPFLAGS="-I$install_dir/include" +export LDFLAGS="-L$install_dir/lib -lncurses" +./configure --prefix="$install_dir" + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset CPPFLAGS +unset LDFLAGS +echo -e "\n### Finishing readline installation...\n" + + +################################################################################ +# +# Install zlib +# + +echo -e "\n### Installing zlib...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$zlib_file + +cd zlib-$zlib_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +echo -e "\n### Finishing zlib installation...\n" + + +################################################################################ +# +# Install libffi +# + +echo -e "\n### Installing libffi...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libffi_file + +cd libffi-$libffi_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-multi-os-directory + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +echo -e "\n### Finishing libffi installation...\n" + + +################################################################################ +# +# Install SQLite +# + +echo -e "\n### Installing SQLite...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$sqlite_file + +cd sqlite-autoconf-$sqlite_download + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC -DSQLITE_DISABLE_INTRINSIC" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset CPPFLAGS +echo -e "\n### Finishing SQLite installation...\n" + + +################################################################################ +# +# Install FFTW +# + +echo -e "\n### Installing FFTW...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$fftw_file + +# Patch the sources to compile with icc +cat >fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpiicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +export PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install wheel +# +# The BSC system does seem to need setuptools_sscm to install pytest. +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpiicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +export PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install wheel +# +# The BSC system does seem to need setuptools_sscm to install pytest. +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +#pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pluggy_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +#pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pluggy_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Install wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +#pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pluggy_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Install wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +#pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pluggy_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpiicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$modules_dir/$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpiicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j $parallel + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$download_dir $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$download_dir $UEABS_DOWNLOADS/$pybind11_file + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We don't care about version numbers here as it is not important for the +# reproducibility of the benchmarks. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$modules_dir/$packageID.lua <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpiicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j 32 + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\nInitialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PATH="$install_dir/bin:$PATH" +LD_LIBRARY_PATH="$install_dir/lib:$LD_LIBRARY_PATH" +LIBRARY_PATH="$install_dir/lib:$LIBRARY_PATH" +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install setuptools and wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +# setuptools_scm seems to be needed to install py. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We do need to care about version numbers because the LRZ system doesn't allow +# downloading from the compute nodes, so we have to do everything manually in +# advance. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir + +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$modules_dir/$packageID <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpiicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j 32 + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\nInitialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +PATH="$install_dir/bin:$PATH" +LD_LIBRARY_PATH="$install_dir/lib:$LD_LIBRARY_PATH" +LIBRARY_PATH="$install_dir/lib:$LIBRARY_PATH" +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install setuptools and wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +# setuptools_scm seems to be needed to install py. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We do need to care about version numbers because the LRZ system doesn't allow +# downloading from the compute nodes, so we have to do everything manually in +# advance. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir + +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$modules_dir/$packageID <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j 32 + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +export PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install setuptools and wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +# setuptools_scm seems to be needed to install py. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We do need to care about version numbers because the LRZ system doesn't allow +# downloading from the compute nodes, so we have to do everything manually in +# advance. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir + +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +#pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pluggy_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$modules_dir/$packageID <fftw.patch <&5 + $as_echo_n "checking whether C compiler accepts -no-gcc... " >&6; } + +EOF + +cd fftw-$fftw_version +patch -p1 <../fftw.patch + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +export MPICC=mpicc +export F77=ifort +export F77FLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-sse --disable-sse2 --disable-avx --enable-avx2 --disable-avx512 --disable-avx-128-fma --disable-kcvi \ + --enable-fma \ + --disable-openmp --disable-threads \ + --enable-mpi \ + -disable-static --enable-shared --disable-fortran + +# Build +make -j $parallel + +# Install +make install + +# Clean-up +unset CC +unset CFLAGS +unset MPICC +unset F77 +unset F77FLAGS + +echo -e "\n### Finishing FFTW installation...\n" + + +################################################################################ +# +# Install libxc +# + +echo -e "\n### Installing libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j $parallel + +# Install +make -j install + +# Clean-up +unset CC +unset CFLAGS + +echo -e "\n### Finishing libxc installation...\n" + + +################################################################################ +# +# Install Python +# + +echo -e "\n### Installing Python...\n" + +cd $build_dir + +# Uncompress +tar -xf $UEABS_DOWNLOADS/$python_file + +cd Python-$python_version + +# Configure +export CC=icc +export CFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC -diag-disable=1678,10148,111,169,188,3438,2650,3175,1890" +export CXX=icpc +export FC=ifort +export F90=$FC +export FFLAGS="$opt_level $proc_opt_flags $fp_opt_flags -fwrapv -fPIC" +export LD=icc +export LDFLAGS="-L$install_dir/lib" +export CPPFLAGS="-I$install_dir/include" +./configure --prefix="$install_dir" \ + --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu \ + --enable-shared --disable-ipv6 \ + --enable-optimizations \ + --with-ensurepip=upgrade +# --with-icc \ + +# Build +make -j 32 + +# Install +make install +cd $install_dir/bin +ln -s python$py_maj_min python + +# Clean-up +unset CC +unset CFLAGS +unset CXX +unset FC +unset F90 +unset FFLAGS +unset LD +unset LDFLAGS +unset CPPFLAGS + +echo -e "\n### Finishing Python installation...\n" + + +################################################################################ +# +# Initialising for installing Python packages +# + +echo -e "\n### Initialising for installing Python packages...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +cd $install_dir +/usr/bin/ln -s lib lib64 +export PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + + +################################################################################ +# +# Install setuptools and wheel +# + +echo -e "\n### Installing wheel...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptools_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$wheel_file +# setuptools_scm seems to be needed to install py. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$setuptoolsscm_file + +echo -e "\n### Finishing wheel installation...\n" + + +################################################################################ +# +# Some other Python packages that are needed +# + +echo -e "\n### Installing additional Python packages...\n" + +cd $build_dir + +# attrs is needed for Cython and the optional pytest. +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$attrs_file +# pybind11 is needed for scipy +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pybind11_file + +echo -e "\n### Finishing additional Python packages installation...\n" + + +################################################################################ +# +# Optional: Install pytest and its dependencies to test NumPy and SciPy with +# import numpy +# numpy.test() +# import scipy +# scipy.text() +# We do need to care about version numbers because the LRZ system doesn't allow +# downloading from the compute nodes, so we have to do everything manually in +# advance. +# + +echo -e "\n### Installing pytest...\n" + +cd $build_dir + +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$toml_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pyparsing_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$py_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$iniconfig_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$packaging_file +#pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pluggy_file +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$pytest_file + +echo -e "\n### Finishing pytest installation...\n" + + +################################################################################ +# +# Install Cython +# + +echo -e "\n### Installing Cython...\n" + +cd $build_dir +pip$py_maj_min install --prefix="$install_dir" --no-deps --ignore-installed --find-links=$UEABS_DOWNLOADS $UEABS_DOWNLOADS/$cython_file + +echo -e "\n### Finishing Cython installation...\n" + + +################################################################################ +# +# Install NumPy +# + +echo -e "\n### Installing NumPy...\n" + +cd $build_dir + +# Uncompress +unzip $UEABS_DOWNLOADS/$numpy_file + +cd numpy-$numpy_version + +cat >site.cfg <siteconfig.py <$modules_dir/$packageID <&1 | tee loki-inst -cd .. -sed -e "s||$tgt|g" -e "s||$PYTHONHOME|" setup/load-gpaw.sh > $tgt/load.sh - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/davide/gpaw-cpu/setup/customize-power8.py b/gpaw/build/examples/davide/gpaw-cpu/setup/customize-power8.py deleted file mode 100644 index a098bfb54a08be9c3c8474feb72ecf98f83ecb24..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw-cpu/setup/customize-power8.py +++ /dev/null @@ -1,39 +0,0 @@ -# Setup customisation for gpaw/cuda -import os - -# compiler and linker -compiler = 'gcc' -mpicompiler = 'mpicc' -mpilinker = 'mpicc' -extra_compile_args = ['-std=c99', '-mcpu=power8'] - -# libraries -libraries = ['z'] - -# openblas -library_dirs += [os.environ['OPENBLAS_ROOT'] + '/lib'] -include_dirs += [os.environ['OPENBLAS_ROOT'] + '/include'] -libraries += ['openblas'] - -# scalapack -library_dirs += [os.environ['SCALAPACK_ROOT'] + '/lib'] -libraries += ['scalapack'] - -# libxc -library_dirs += [os.environ['LIBXCDIR'] + '/lib'] -include_dirs += [os.environ['LIBXCDIR'] + '/include'] -libraries += ['xc'] - -# GPAW defines -define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')] -define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')] -define_macros += [("GPAW_ASYNC",1)] -define_macros += [("GPAW_MPI2",1)] -define_macros += [('GPAW_CUDA', '1')] - -# ScaLAPACK -scalapack = True - -# HDF5 -hdf5 = False - diff --git a/gpaw/build/examples/davide/gpaw-cpu/setup/load-gpaw.sh b/gpaw/build/examples/davide/gpaw-cpu/setup/load-gpaw.sh deleted file mode 100644 index f7f01edf5454c2b9854ec37ecd9ee32330b06bb0..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw-cpu/setup/load-gpaw.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash -module load cuda/9.2.88 -module load cudnn/7.1.4--cuda--9.2.88 -module load gnu/6.4.0 -module load openmpi/3.1.0--gnu--6.4.0 -source $CINECA_SCRATCH/lib/openblas-0.3.4-openmp/load.sh -source /load.sh -source $CINECA_SCRATCH/lib/scalapack-2.0.2/load.sh -export GPAW_SETUP_PATH=$CINECA_SCRATCH/lib/gpaw-setups-0.9.11271 -export PATH=/bin:$PATH -export PYTHONPATH=/lib/python2.7/site-packages:$PYTHONPATH diff --git a/gpaw/build/examples/davide/gpaw/build.sh b/gpaw/build/examples/davide/gpaw/build.sh deleted file mode 100644 index 56ff5e88e4a6c25a2b5a54d32add958f764b5540..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw/build.sh +++ /dev/null @@ -1,36 +0,0 @@ -### GPAW installation script for D.A.V.I.D.E - -# version numbers (modify if needed) -gpaw_version=cuda - -# installation directory (modify!) -tgt=$CINECA_SCRATCH/lib/gpaw-${gpaw_version} - -# setup build environment -module load cuda/9.2.88 -module load cudnn/7.1.4--cuda--9.2.88 -module load gnu/6.4.0 -module load openmpi/3.1.0--gnu--6.4.0 -source $CINECA_SCRATCH/lib/openblas-0.3.4-openmp/load.sh -source $CINECA_SCRATCH/lib/python-2018-12-cuda/load.sh 2016-06 -source $CINECA_SCRATCH/lib/scalapack-2.0.2/load.sh -export GPAW_SETUP_PATH=$CINECA_SCRATCH/lib/gpaw-setups-0.9.11271 -export CFLAGS="" - -# gpaw -git clone https://gitlab.com/mlouhivu/gpaw.git gpaw-$gpaw_version -cd gpaw-$gpaw_version -git checkout $gpaw_version -patch gpaw/eigensolvers/rmm_diis.py ../setup/patch-rmmdiis.diff -cp ../setup/customize-cuda.py . -cd c/cuda -cp ../../../setup/make.inc . -make 2>&1 | tee loki-make -cd - -python setup.py install --customize=customize-cuda.py --prefix=$tgt 2>&1 | tee loki-inst -cd .. -sed -e "s||$tgt|g" -e "s||$PYTHONHOME|" setup/load-gpaw.sh > $tgt/load.sh - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/davide/gpaw/setup/customize-cuda.py b/gpaw/build/examples/davide/gpaw/setup/customize-cuda.py deleted file mode 100644 index d2aae9409832d719e06a66450032d8588e769149..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw/setup/customize-cuda.py +++ /dev/null @@ -1,44 +0,0 @@ -# Setup customisation for gpaw/cuda -import os - -# compiler and linker -compiler = 'gcc' -mpicompiler = 'mpicc' -mpilinker = 'mpicc' -extra_compile_args = ['-std=c99', '-mcpu=power8'] - -# libraries -libraries = ['z'] - -# cuda -library_dirs += [os.environ['CUDA_LIB'], './c/cuda'] -include_dirs += [os.environ['CUDA_HOME'] + '/include'] -libraries += ['gpaw-cuda', 'cublas', 'cudart', 'stdc++'] - -# openblas -library_dirs += [os.environ['OPENBLAS_ROOT'] + '/lib'] -include_dirs += [os.environ['OPENBLAS_ROOT'] + '/include'] -libraries += ['openblas'] - -# scalapack -library_dirs += [os.environ['SCALAPACK_ROOT'] + '/lib'] -libraries += ['scalapack'] - -# libxc -library_dirs += [os.environ['LIBXCDIR'] + '/lib'] -include_dirs += [os.environ['LIBXCDIR'] + '/include'] -libraries += ['xc'] - -# GPAW defines -define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')] -define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')] -define_macros += [("GPAW_ASYNC",1)] -define_macros += [("GPAW_MPI2",1)] -define_macros += [('GPAW_CUDA', '1')] - -# ScaLAPACK -scalapack = True - -# HDF5 -hdf5 = False - diff --git a/gpaw/build/examples/davide/gpaw/setup/load-gpaw.sh b/gpaw/build/examples/davide/gpaw/setup/load-gpaw.sh deleted file mode 100644 index f7f01edf5454c2b9854ec37ecd9ee32330b06bb0..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw/setup/load-gpaw.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash -module load cuda/9.2.88 -module load cudnn/7.1.4--cuda--9.2.88 -module load gnu/6.4.0 -module load openmpi/3.1.0--gnu--6.4.0 -source $CINECA_SCRATCH/lib/openblas-0.3.4-openmp/load.sh -source /load.sh -source $CINECA_SCRATCH/lib/scalapack-2.0.2/load.sh -export GPAW_SETUP_PATH=$CINECA_SCRATCH/lib/gpaw-setups-0.9.11271 -export PATH=/bin:$PATH -export PYTHONPATH=/lib/python2.7/site-packages:$PYTHONPATH diff --git a/gpaw/build/examples/davide/gpaw/setup/make.inc b/gpaw/build/examples/davide/gpaw/setup/make.inc deleted file mode 100644 index 2fa2d24c2ffc12959844333af0e3df70a07b0f71..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw/setup/make.inc +++ /dev/null @@ -1,57 +0,0 @@ -#################################################################### -# make include file. # -#################################################################### -# -SHELL = /bin/sh - -# ---------------------------------------------------------------------- -# - gpaw-cuda Directory Structure / gpaw-cuda library -------------------- -# ---------------------------------------------------------------------- -# -TOPdir = . -INCdir = $(TOPdir) -PYTHONINCdir ?= $(PYTHONHOME)/include/python2.7/ -PYTHONLIBdir ?= $(PYTHONHOME)/lib/ -NUMPYINCdir ?= `python -c "import numpy; print numpy.get_include()"` -MPIINCdir ?= $(OPENMPI_HOME)/include -LIBdir = $(TOPdir) -CUGPAWLIB = $(LIBdir)/libgpaw-cuda.a - -# -# ---------------------------------------------------------------------- -# - NVIDIA CUDA includes / libraries / specifics ----------------------- -# ---------------------------------------------------------------------- -CUDAINCdir = $(CUDADIR)/include -CUDALIBdir = $(CUDADIR)/lib64 -CUDA_OPTS = - -# -# ---------------------------------------------------------------------- -# - gpaw-cuda includes / libraries / specifics ------------------------------- -# ---------------------------------------------------------------------- -# - -CUGPAW_INCLUDES = -I$(INCdir) -I$(CUDAINCdir) -I$(MPIINCdir) -I$(NUMPYINCdir) -I$(PYTHONINCdir) -CUGPAW_OPTS = -DPARALLEL=1 -DGPAW_CUDA=1 - -# -# ---------------------------------------------------------------------- -# - -CUGPAW_DEFS = $(CUGPAW_OPTS) $(CUDA_OPTS) $(CUGPAW_INCLUDES) - -# -# ---------------------------------------------------------------------- -# - Compilers / linkers - Optimization flags --------------------------- -# ---------------------------------------------------------------------- -CC = gcc -CCNOOPT = $(CUGPAW_DEFS) -CCFLAGS = $(CUGPAW_DEFS) -g -fPIC -std=c99 -m64 -O3 - -NVCC = nvcc -NVCCFLAGS = $(CUGPAW_DEFS) -O3 -g -gencode arch=compute_60,code=sm_60 -m64 --compiler-options '-O3 -g -std=c99 -fPIC' - -ARCH = ar -ARCHFLAGS= cr -RANLIB = ranlib - diff --git a/gpaw/build/examples/davide/gpaw/setup/patch-rmmdiis.diff b/gpaw/build/examples/davide/gpaw/setup/patch-rmmdiis.diff deleted file mode 100644 index adefa27c31dd72c87df2d3e35b8fcb56a7224dcf..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/gpaw/setup/patch-rmmdiis.diff +++ /dev/null @@ -1,34 +0,0 @@ -commit 761cba649d58e2d2f24c0a1e2fdad917b5929679 -Author: Martti Louhivuori -Date: Thu May 18 10:56:07 2017 +0300 - - Remove obsolete error calculation from DIIS step - -diff --git a/gpaw/eigensolvers/rmmdiis.py b/gpaw/eigensolvers/rmmdiis.py -index 7d60553..d182713 100644 ---- a/gpaw/eigensolvers/rmmdiis.py -+++ b/gpaw/eigensolvers/rmmdiis.py -@@ -299,23 +299,6 @@ class RMMDIIS(Eigensolver): - P_axi, kpt.eps_n[n_x], R_xG, n_x, - calculate_change=True) - self.timer.stop('Calculate residuals') -- self.timer.start('Calculate errors') -- errors_new_x = np.zeros(B) -- # errors_x[:] = 0.0 -- for n in range(n1, n2): -- if kpt.f_n is None: -- weight = kpt.weight -- else: -- weight = kpt.f_n[n] -- if self.nbands_converge != 'occupied': -- if wfs.bd.global_index(n) < self.nbands_converge: -- weight = kpt.weight -- else: -- weight = 0.0 -- errors_new_x[n-n1] += weight * integrate(R_xG[n - n1], -- R_xG[n - n1]) -- comm.sum(errors_x) -- self.timer.stop('Calculate errors') - - self.timer.stop('DIIS step') - # Final trial step diff --git a/gpaw/build/examples/davide/openblas/build.sh b/gpaw/build/examples/davide/openblas/build.sh deleted file mode 100644 index e108514a230e476ff391e327e9bbd5bac0f5902a..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/openblas/build.sh +++ /dev/null @@ -1,32 +0,0 @@ -### OpenBLAS installation script for D.A.V.I.D.E - -# version numbers (modify if needed) -openblas_version=0.3.4 - -# installation directory (modify!) -tgt=$CINECA_SCRATCH/lib/openblas-${openblas_version}-openmp - -# setup build environment -module load cuda/9.2.88 -module load cudnn/7.1.4--cuda--9.2.88 -module load gnu/6.4.0 -module load openmpi/3.1.0--gnu--6.4.0 -export CC=gcc -export CFLAGS='-mcpu=power8 -O3' -export CXX=g++ -export CXXFLAGS='-mcpu=power8 -O3' -export FC=gfortran -export FFLAGS='-mcpu=power8 -O3' - -# openblas -git clone https://github.com/xianyi/OpenBLAS -cd OpenBLAS -git checkout v$openblas_version -make TARGET=POWER8 USE_OPENMP=1 2>&1 | tee loki-make -make install PREFIX=$tgt 2>&1 | tee loki-install -sed -e "s||$tgt|g" ../setup/load-openblas.sh > $tgt/load.sh -cd .. - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/davide/openblas/setup/load-openblas.sh b/gpaw/build/examples/davide/openblas/setup/load-openblas.sh deleted file mode 100644 index eb99f2f9d7c089cd3a372a288516b7798b5ec32b..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/openblas/setup/load-openblas.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash -export OPENBLAS_ROOT= -export LD_LIBRARY_PATH=$OPENBLAS_ROOT/lib:$LD_LIBRARY_PATH diff --git a/gpaw/build/examples/davide/python/build-extra.sh b/gpaw/build/examples/davide/python/build-extra.sh deleted file mode 100644 index 1b17596db68b62ce328e48e067d4fc89d5d02a4b..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/python/build-extra.sh +++ /dev/null @@ -1,84 +0,0 @@ -### Python (extra) modules installation script for D.A.V.I.D.E -### uses PYTHONUSERBASE to bundle all modules into a separate location -### away from the base python installation - -# load Python -source $PYTHONHOME/load.sh - -# bundle ID (e.g. time of release) (modify if needed) -bundle=2016-06 - -# version numbers (modify if needed) -numpy_version=1.10.4 -scipy_version=0.17.1 -ase_version=3.11.0 -pycuda_version=2017.1.1 -libxc_version=2.1.3 - -# installation directory (modify!) -tgt=$PYTHONHOME/bundle/$bundle - -# setup build environment -export CFLAGS="-fPIC $CFLAGS" -export FFLAGS="-fPIC $FFLAGS" - -# use --user to install modules -export PYTHONUSERBASE=$tgt -mkdir -p $PYTHONUSERBASE/lib/python2.7/site-packages - -# build in a separate directory -mkdir bundle-$bundle -cd bundle-$bundle - -# cython + mpi4py -pip install --user cython -pip install --user mpi4py - -# numpy -git clone git://github.com/numpy/numpy.git numpy-$numpy_version -cd numpy-$numpy_version -git checkout v$numpy_version -sed -e "s||$OPENBLAS_ROOT|g" ../../setup/davide-openblas.cfg > site.cfg -python setup.py build -j 4 install --user 2>&1 | tee loki-inst -cd .. - -# scipy -git clone git://github.com/scipy/scipy.git scipy-$scipy_version -cd scipy-$scipy_version -git checkout v$scipy_version -python setup.py build -j 4 install --user 2>&1 | tee loki-inst -cd .. - -# ase -git clone https://gitlab.com/ase/ase.git ase-$ase_version -cd ase-$ase_version -git checkout $ase_version -python setup.py install --user 2>&1 | tee loki-inst -cd .. - -# libxc -tar xvfz ~/src/libxc-${libxc_version}.tar.gz -cd libxc-$libxc_version -./configure --prefix=$PYTHONUSERBASE --enable-shared | tee loki-conf -make | tee loki-make -make install | tee loki-inst -export LD_LIBRARY_PATH=$PYTHONUSERBASE/lib:$LD_LIBRARY_PATH -cd .. - -# pycuda -pip install --user pycuda==$pycuda_version - -# go back to the main build directory -cd .. - -# if this is the first bundle, use it as default -if [ ! -e $PYTHONHOME/bundle/default ] -then - cd $PYTHONHOME/bundle - ln -s $bundle default - cd - -fi - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/davide/python/build.sh b/gpaw/build/examples/davide/python/build.sh deleted file mode 100644 index 6dbaeea61c402250233bbf8b62886f8718a3051d..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/python/build.sh +++ /dev/null @@ -1,40 +0,0 @@ -### Python installation script for D.A.V.I.D.E -### uses --prefix to set a custom installation directory - -# version numbers (modify if needed) -python_version=2.7.13 - -# installation directory (modify!) -tgt=$CINECA_SCRATCH/lib/python-2018-12-cuda - -# setup build environment -module load cuda/9.2.88 -module load cudnn/7.1.4--cuda--9.2.88 -module load gnu/6.4.0 -module load openmpi/3.1.0--gnu--6.4.0 -source $CINECA_SCRATCH/lib/openblas-0.3.4-openmp/load.sh -export CC=gcc -export CFLAGS='-mcpu=power8 -O3' -export CXX=g++ -export CXXFLAGS='-mcpu=power8 -O3' -export F77=gfortran -export FFLAGS='-mcpu=power8 -O3' - -# python -git clone https://github.com/python/cpython.git python-$python_version -cd python-$python_version -git checkout v$python_version -./configure --prefix=$tgt --enable-shared --disable-ipv6 --enable-unicode=ucs4 2>&1 | tee loki-conf -make 2>&1 | tee loki-make -make install 2>&1 | tee loki-inst -cd .. -sed -e "s||$tgt|g" setup/load-python.sh > $tgt/load.sh - -# install pip -source $tgt/load.sh -python -m ensurepip -pip install --upgrade pip - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/davide/python/setup/davide-openblas.cfg b/gpaw/build/examples/davide/python/setup/davide-openblas.cfg deleted file mode 100644 index ba027c31a76734ca1d69c30f38c0c4832d35c59a..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/python/setup/davide-openblas.cfg +++ /dev/null @@ -1,4 +0,0 @@ -[openblas] -libraries = openblas -library_dirs = /lib -include_dirs = /include diff --git a/gpaw/build/examples/davide/python/setup/load-python.sh b/gpaw/build/examples/davide/python/setup/load-python.sh deleted file mode 100644 index 69b2033864e3bbf9f756946f78eb24f71cc4ddaa..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/python/setup/load-python.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/bin/bash -export PYTHONHOME= -export PYTHONPATH=$PYTHONHOME/lib -export PATH=$PYTHONHOME/bin:$PATH -export MANPATH=$PYTHONHOME/share/man:$MANPATH -export LD_LIBRARY_PATH=$PYTHONHOME/lib:$LD_LIBRARY_PATH - -if [[ $# -gt 0 ]] -then - export PYTHONUSERBASE=$PYTHONHOME/bundle/$1 - export PATH=$PYTHONUSERBASE/bin:$PATH - export LD_LIBRARY_PATH=$PYTHONUSERBASE/lib:$LD_LIBRARY_PATH -elif [[ -e $PYTHONHOME/bundle/default ]] -then - export PYTHONUSERBASE=$PYTHONHOME/bundle/default - export PATH=$PYTHONUSERBASE/bin:$PATH - export LD_LIBRARY_PATH=$PYTHONUSERBASE/lib:$LD_LIBRARY_PATH -fi -if [[ -e $PYTHONUSERBASE/include/xc.h ]] -then - export LIBXCDIR=$PYTHONUSERBASE -fi diff --git a/gpaw/build/examples/davide/scalapack/build.sh b/gpaw/build/examples/davide/scalapack/build.sh deleted file mode 100644 index 52a7bb31071ea15a099ee970a951042a5c2242c1..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/scalapack/build.sh +++ /dev/null @@ -1,32 +0,0 @@ -### ScaLAPACK installation script for D.A.V.I.D.E - -# version numbers (modify if needed) -scalapack_version=2.0.2 - -# installation directory (modify!) -tgt=$CINECA_SCRATCH/lib/scalapack-${scalapack_version} - -# setup build environment -module load cuda/9.2.88 -module load cudnn/7.1.4--cuda--9.2.88 -module load gnu/6.4.0 -module load openmpi/3.1.0--gnu--6.4.0 -source $CINECA_SCRATCH/lib/openblas-0.3.4-openmp/load.sh -export CFLAGS="-mcpu=power8 -O3" -export FFLAGS="-mcpu=power8 -O3" - -# scalapack -tar xvfz ~/scalapack-${scalapack_version}.tgz -cd scalapack-${scalapack_version} -cp ../setup/SLmake.inc . -mkdir build -cd build -cmake -DBLAS_LIBRARIES=$OPENBLAS_ROOT/lib/libopenblas.so -DLAPACK_LIBRARIES=$OPENBLAS_ROOT/lib/libopenblas.so -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=$tgt .. -make 2>&1 | tee loki-make -make install 2>&1 | tee loki-install -sed -e "s||$tgt|g" ../../setup/load-scalapack.sh > $tgt/load.sh -cd ../.. - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/davide/scalapack/setup/SLmake.inc b/gpaw/build/examples/davide/scalapack/setup/SLmake.inc deleted file mode 100644 index 4d5f6aa38cc020b0ae04f9714a16febf21216b59..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/scalapack/setup/SLmake.inc +++ /dev/null @@ -1,61 +0,0 @@ -############################################################################ -# -# Program: ScaLAPACK -# -# Module: SLmake.inc -# -# Purpose: Top-level Definitions -# -# Creation date: February 15, 2000 -# -# Modified: October 13, 2011 -# -# Send bug reports, comments or suggestions to scalapack@cs.utk.edu -# -############################################################################ -# -# C preprocessor definitions: set CDEFS to one of the following: -# -# -DNoChange (fortran subprogram names are lower case without any suffix) -# -DUpCase (fortran subprogram names are upper case without any suffix) -# -DAdd_ (fortran subprogram names are lower case with "_" appended) - -CDEFS = -DAdd_ - -# -# The fortran and C compilers, loaders, and their flags -# - -FC = mpif90 -CC = mpicc -NOOPT = -mcpu=power8 -O0 -FCFLAGS = -mcpu=power8 -O3 -shared -fPIC -CCFLAGS = -mcpu=power8 -O3 -shared -fPIC -FCLOADER = $(FC) -CCLOADER = $(CC) -FCLOADFLAGS = $(FCFLAGS) -CCLOADFLAGS = $(CCFLAGS) - -# -# The archiver and the flag(s) to use when building archive (library) -# Also the ranlib routine. If your system has no ranlib, set RANLIB = echo -# - -ARCH = ar -ARCHFLAGS = cr -RANLIB = ranlib - -# -# The name of the ScaLAPACK library to be created -# - -SCALAPACKLIB = libscalapack.so - -# -# BLAS, LAPACK (and possibly other) libraries needed for linking test programs -# - -BLASLIB = -lopenblas -L$(OPENBLAS_ROOT)/lib -I$(OPENBLAS_ROOT)/include -LAPACKLIB = -LIBS = $(LAPACKLIB) $(BLASLIB) - diff --git a/gpaw/build/examples/davide/scalapack/setup/load-scalapack.sh b/gpaw/build/examples/davide/scalapack/setup/load-scalapack.sh deleted file mode 100644 index c37538574e2d073453dba51efd66a5d7599649cb..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/davide/scalapack/setup/load-scalapack.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash -export SCALAPACK_ROOT= -export LD_LIBRARY_PATH=$SCALAPACK_ROOT/lib:$LD_LIBRARY_PATH diff --git a/gpaw/build/examples/juwels/gpaw/build.sh b/gpaw/build/examples/juwels/gpaw/build.sh deleted file mode 100644 index 5206b59856d7179d009f82c34a70e2d27a3a362c..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/gpaw/build.sh +++ /dev/null @@ -1,30 +0,0 @@ -### GPAW installation script for JUWELS - -# version numbers (modify if needed) -gpaw_version=1.1.0 - -# installation directory (modify!) -tgt=$HOME/lib/gpaw-${gpaw_version} - -# setup build environment -module load CUDA/9.2.88 -module load Intel/2019.0.117-GCC-7.3.0 -module load IntelMPI/2018.4.274 -module load imkl/2019.0.117 -source $HOME/lib/python-2019-01/load.sh 2016-06 -export GPAW_SETUP_PATH=$HOME/lib/gpaw-setups-0.9.11271 -export CFLAGS="" - -# gpaw -git clone https://gitlab.com/mlouhivu/gpaw.git gpaw-$gpaw_version -cd gpaw-$gpaw_version -git checkout $gpaw_version -patch gpaw/eigensolvers/rmm_diis.py ../setup/patch-rmmdiis.diff -cp ../setup/customize-juwels.py . -python setup.py install --customize=customize-juwels.py --prefix=$tgt 2>&1 | tee loki-inst -cd .. -sed -e "s||$tgt|g" -e "s||$PYTHONHOME|" setup/load-gpaw.sh > $tgt/load.sh - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/juwels/gpaw/setup/customize-juwels.py b/gpaw/build/examples/juwels/gpaw/setup/customize-juwels.py deleted file mode 100644 index 7d7457df750f04cefd55523a6e1a6fb0cb2a646a..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/gpaw/setup/customize-juwels.py +++ /dev/null @@ -1,34 +0,0 @@ -# Setup customisation for gpaw/cuda -import os - -# compiler and linker -compiler = 'icc' -mpicompiler = 'mpicc' -mpilinker = 'mpicc' -extra_compile_args = ['-std=c99'] - -# libraries -libraries = ['z'] - -# use MKL -library_dirs += [os.environ['MKLROOT'] + '/lib/intel64/'] -libraries = ['mkl_intel_lp64', 'mkl_sequential', 'mkl_core'] -mpi_libraries += ['mkl_scalapack_lp64', 'mkl_blacs_intelmpi_lp64'] - -# libxc -library_dirs += [os.environ['LIBXCDIR'] + '/lib'] -include_dirs += [os.environ['LIBXCDIR'] + '/include'] -libraries += ['xc'] - -# GPAW defines -define_macros += [('GPAW_NO_UNDERSCORE_CBLACS', '1')] -define_macros += [('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')] -define_macros += [("GPAW_ASYNC",1)] -define_macros += [("GPAW_MPI2",1)] - -# ScaLAPACK -scalapack = True - -# HDF5 -hdf5 = False - diff --git a/gpaw/build/examples/juwels/gpaw/setup/load-gpaw.sh b/gpaw/build/examples/juwels/gpaw/setup/load-gpaw.sh deleted file mode 100644 index 2bb4fe410fe5c1494b2994999ffd3c4f190aaf3d..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/gpaw/setup/load-gpaw.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/bin/bash -module load CUDA/9.2.88 -module load Intel/2019.0.117-GCC-7.3.0 -module load IntelMPI/2018.4.274 -module load imkl/2019.0.117 -source $HOME/lib/python-2019-01/load.sh 2016-06 -export GPAW_SETUP_PATH=$HOME/lib/gpaw-setups-0.9.11271 -export PATH=/bin:$PATH -export PYTHONPATH=/lib/python2.7/site-packages:$PYTHONPATH diff --git a/gpaw/build/examples/juwels/gpaw/setup/patch-rmmdiis.diff b/gpaw/build/examples/juwels/gpaw/setup/patch-rmmdiis.diff deleted file mode 100644 index adefa27c31dd72c87df2d3e35b8fcb56a7224dcf..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/gpaw/setup/patch-rmmdiis.diff +++ /dev/null @@ -1,34 +0,0 @@ -commit 761cba649d58e2d2f24c0a1e2fdad917b5929679 -Author: Martti Louhivuori -Date: Thu May 18 10:56:07 2017 +0300 - - Remove obsolete error calculation from DIIS step - -diff --git a/gpaw/eigensolvers/rmmdiis.py b/gpaw/eigensolvers/rmmdiis.py -index 7d60553..d182713 100644 ---- a/gpaw/eigensolvers/rmmdiis.py -+++ b/gpaw/eigensolvers/rmmdiis.py -@@ -299,23 +299,6 @@ class RMMDIIS(Eigensolver): - P_axi, kpt.eps_n[n_x], R_xG, n_x, - calculate_change=True) - self.timer.stop('Calculate residuals') -- self.timer.start('Calculate errors') -- errors_new_x = np.zeros(B) -- # errors_x[:] = 0.0 -- for n in range(n1, n2): -- if kpt.f_n is None: -- weight = kpt.weight -- else: -- weight = kpt.f_n[n] -- if self.nbands_converge != 'occupied': -- if wfs.bd.global_index(n) < self.nbands_converge: -- weight = kpt.weight -- else: -- weight = 0.0 -- errors_new_x[n-n1] += weight * integrate(R_xG[n - n1], -- R_xG[n - n1]) -- comm.sum(errors_x) -- self.timer.stop('Calculate errors') - - self.timer.stop('DIIS step') - # Final trial step diff --git a/gpaw/build/examples/juwels/python/build-extra.sh b/gpaw/build/examples/juwels/python/build-extra.sh deleted file mode 100644 index 8ab157293cc7e8b382f41d235d1ad62d87ee6b5b..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/python/build-extra.sh +++ /dev/null @@ -1,88 +0,0 @@ -### Python (extra) modules installation script for JUWELS -### uses PYTHONUSERBASE to bundle all modules into a separate location -### away from the base python installation - -# load Python -source $PYTHONHOME/load.sh - -# bundle ID (e.g. time of release) (modify if needed) -bundle=2016-06 - -# version numbers (modify if needed) -numpy_version=1.10.4 -scipy_version=0.17.1 -ase_version=3.11.0 -pycuda_version=2017.1.1 -libxc_version=2.1.3 - -# installation directory (modify!) -tgt=$PYTHONHOME/bundle/$bundle - -# setup build environment -export CFLAGS="-fPIC $CFLAGS" -export FFLAGS="-fPIC $FFLAGS" - -# use --user to install modules -export PYTHONUSERBASE=$tgt -mkdir -p $PYTHONUSERBASE/lib/python2.7/site-packages - -# build in a separate directory -mkdir bundle-$bundle -cd bundle-$bundle - -# cython + mpi4py -pip install --user cython -pip install --user mpi4py - -# numpy -git clone git://github.com/numpy/numpy.git numpy-$numpy_version -cd numpy-$numpy_version -git checkout v$numpy_version -sed -e "s||$MKLROOT|g" ../../setup/juwels-mkl.cfg > site.cfg -sed -e "s||$FFLAGS|g" ../../setup/patch-intel-fcompiler.diff > patch-intel-fcompiler.diff -sed -e "s||$CFLAGS|g" ../../setup/patch-intel-ccompiler.diff > patch-intel-ccompiler.diff -patch numpy/distutils/fcompiler/intel.py patch-intel-fcompiler.diff -patch numpy/distutils/intelccompiler.py patch-intel-ccompiler.diff -python setup.py build -j 4 install --user 2>&1 | tee loki-inst -cd .. - -# scipy -git clone git://github.com/scipy/scipy.git scipy-$scipy_version -cd scipy-$scipy_version -git checkout v$scipy_version -python setup.py build -j 4 install --user 2>&1 | tee loki-inst -cd .. - -# ase -git clone https://gitlab.com/ase/ase.git ase-$ase_version -cd ase-$ase_version -git checkout $ase_version -python setup.py install --user 2>&1 | tee loki-inst -cd .. - -# libxc -tar xvfz ~/src/libxc-${libxc_version}.tar.gz -cd libxc-$libxc_version -./configure --prefix=$PYTHONUSERBASE --enable-shared | tee loki-conf -make | tee loki-make -make install | tee loki-inst -export LD_LIBRARY_PATH=$PYTHONUSERBASE/lib:$LD_LIBRARY_PATH -cd .. - -# pycuda -pip install --user pycuda==$pycuda_version - -# go back to the main build directory -cd .. - -# if this is the first bundle, use it as default -if [ ! -e $PYTHONHOME/bundle/default ] -then - cd $PYTHONHOME/bundle - ln -s $bundle default - cd - -fi - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/juwels/python/build.sh b/gpaw/build/examples/juwels/python/build.sh deleted file mode 100644 index 252f31e4dda62c25173c28995d925d520c69b6ba..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/python/build.sh +++ /dev/null @@ -1,39 +0,0 @@ -### Python installation script for JUWELS -### uses --prefix to set a custom installation directory - -# version numbers (modify if needed) -python_version=2.7.13 - -# installation directory (modify!) -tgt=$HOME/lib/python-2019-01 - -# setup build environment -module load CUDA/9.2.88 -module load Intel/2019.0.117-GCC-7.3.0 -module load IntelMPI/2018.4.274 -module load imkl/2019.0.117 -export CC=icc -export CFLAGS='-O2 -xAVX2 -axCORE-AVX512' -export CXX=icpc -export FC=ifort -export F90=$FC -export FFLAGS=$CFLAGS - -# python -git clone https://github.com/python/cpython.git python-$python_version -cd python-$python_version -git checkout v$python_version -./configure --prefix=$tgt --enable-shared --disable-ipv6 --enable-unicode=ucs4 2>&1 | tee loki-conf -make 2>&1 | tee loki-make -make install 2>&1 | tee loki-inst -cd .. -sed -e "s||$tgt|g" setup/load-python.sh > $tgt/load.sh - -# install pip -source $tgt/load.sh -python -m ensurepip -pip install --upgrade pip - -# fix permissions -chmod -R g+rwX $tgt -chmod -R o+rX $tgt diff --git a/gpaw/build/examples/juwels/python/setup/juwels-mkl.cfg b/gpaw/build/examples/juwels/python/setup/juwels-mkl.cfg deleted file mode 100644 index 3e2d3e8c391129507b6fbec06c00e3965863af55..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/python/setup/juwels-mkl.cfg +++ /dev/null @@ -1,5 +0,0 @@ -[mkl] -library_dirs = /lib/intel64 -include_dirs = /include -lapack_libs = -mkl_libs = mkl_rt diff --git a/gpaw/build/examples/juwels/python/setup/load-python.sh b/gpaw/build/examples/juwels/python/setup/load-python.sh deleted file mode 100644 index 69b2033864e3bbf9f756946f78eb24f71cc4ddaa..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/python/setup/load-python.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/bin/bash -export PYTHONHOME= -export PYTHONPATH=$PYTHONHOME/lib -export PATH=$PYTHONHOME/bin:$PATH -export MANPATH=$PYTHONHOME/share/man:$MANPATH -export LD_LIBRARY_PATH=$PYTHONHOME/lib:$LD_LIBRARY_PATH - -if [[ $# -gt 0 ]] -then - export PYTHONUSERBASE=$PYTHONHOME/bundle/$1 - export PATH=$PYTHONUSERBASE/bin:$PATH - export LD_LIBRARY_PATH=$PYTHONUSERBASE/lib:$LD_LIBRARY_PATH -elif [[ -e $PYTHONHOME/bundle/default ]] -then - export PYTHONUSERBASE=$PYTHONHOME/bundle/default - export PATH=$PYTHONUSERBASE/bin:$PATH - export LD_LIBRARY_PATH=$PYTHONUSERBASE/lib:$LD_LIBRARY_PATH -fi -if [[ -e $PYTHONUSERBASE/include/xc.h ]] -then - export LIBXCDIR=$PYTHONUSERBASE -fi diff --git a/gpaw/build/examples/juwels/python/setup/patch-intel-ccompiler.diff b/gpaw/build/examples/juwels/python/setup/patch-intel-ccompiler.diff deleted file mode 100644 index 17266bdf36acd62c3109c25b5a99ee3adde2e8a4..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/python/setup/patch-intel-ccompiler.diff +++ /dev/null @@ -1,21 +0,0 @@ -diff --git a/numpy/distutils/intelccompiler.py b/numpy/distutils/intelccompiler.py -index 20c6d2b..7bdc3d0 100644 ---- a/numpy/distutils/intelccompiler.py -+++ b/numpy/distutils/intelccompiler.py -@@ -48,13 +48,12 @@ class IntelEM64TCCompiler(UnixCCompiler): - A modified Intel x86_64 compiler compatible with a 64bit GCC-built Python. - """ - compiler_type = 'intelem' -- cc_exe = 'icc -m64' -- cc_args = '-fPIC' -+ cc_exe = 'icc' -+ cc_args = '-fPIC ' - - def __init__(self, verbose=0, dry_run=0, force=0): - UnixCCompiler.__init__(self, verbose, dry_run, force) -- self.cc_exe = ('icc -m64 -fPIC -fp-model strict -O3 ' -- '-fomit-frame-pointer -openmp -xSSE4.2') -+ self.cc_exe = ('icc -fPIC ') - compiler = self.cc_exe - if platform.system() == 'Darwin': - shared_flag = '-Wl,-undefined,dynamic_lookup' diff --git a/gpaw/build/examples/juwels/python/setup/patch-intel-fcompiler.diff b/gpaw/build/examples/juwels/python/setup/patch-intel-fcompiler.diff deleted file mode 100644 index a9dd7b07d5d89c4867fa5ef26903ffa7fc76b3dd..0000000000000000000000000000000000000000 --- a/gpaw/build/examples/juwels/python/setup/patch-intel-fcompiler.diff +++ /dev/null @@ -1,17 +0,0 @@ -diff --git a/numpy/distutils/fcompiler/intel.py b/numpy/distutils/fcompiler/intel.py -index 2dd08e7..1df2a99 100644 ---- a/numpy/distutils/fcompiler/intel.py -+++ b/numpy/distutils/fcompiler/intel.py -@@ -120,10 +120,10 @@ class IntelEM64TFCompiler(IntelFCompiler): - return ['-fPIC'] - - def get_flags_opt(self): -- return ['-openmp -fp-model strict'] -+ return ['-fPIC '] - - def get_flags_arch(self): -- return ['-xSSE4.2'] -+ return [''] - - # Is there no difference in the version string between the above compilers - # and the Visual compilers? diff --git a/gpaw/build/patch-rmmdiis.diff b/gpaw/build/patch-rmmdiis.diff deleted file mode 100644 index adefa27c31dd72c87df2d3e35b8fcb56a7224dcf..0000000000000000000000000000000000000000 --- a/gpaw/build/patch-rmmdiis.diff +++ /dev/null @@ -1,34 +0,0 @@ -commit 761cba649d58e2d2f24c0a1e2fdad917b5929679 -Author: Martti Louhivuori -Date: Thu May 18 10:56:07 2017 +0300 - - Remove obsolete error calculation from DIIS step - -diff --git a/gpaw/eigensolvers/rmmdiis.py b/gpaw/eigensolvers/rmmdiis.py -index 7d60553..d182713 100644 ---- a/gpaw/eigensolvers/rmmdiis.py -+++ b/gpaw/eigensolvers/rmmdiis.py -@@ -299,23 +299,6 @@ class RMMDIIS(Eigensolver): - P_axi, kpt.eps_n[n_x], R_xG, n_x, - calculate_change=True) - self.timer.stop('Calculate residuals') -- self.timer.start('Calculate errors') -- errors_new_x = np.zeros(B) -- # errors_x[:] = 0.0 -- for n in range(n1, n2): -- if kpt.f_n is None: -- weight = kpt.weight -- else: -- weight = kpt.f_n[n] -- if self.nbands_converge != 'occupied': -- if wfs.bd.global_index(n) < self.nbands_converge: -- weight = kpt.weight -- else: -- weight = 0.0 -- errors_new_x[n-n1] += weight * integrate(R_xG[n - n1], -- R_xG[n - n1]) -- comm.sum(errors_x) -- self.timer.stop('Calculate errors') - - self.timer.stop('DIIS step') - # Final trial step diff --git a/gpaw/build/examples/davide/gpaw-cpu/setup/patch-rmmdiis.diff b/gpaw/build/patches/patch-rmmdiis.diff similarity index 100% rename from gpaw/build/examples/davide/gpaw-cpu/setup/patch-rmmdiis.diff rename to gpaw/build/patches/patch-rmmdiis.diff diff --git a/gpaw/scripts/affinity-wrapper.sh b/gpaw/scripts/affinity-wrapper.sh deleted file mode 100755 index 49105069191456d0fad6068862a24f250464b60b..0000000000000000000000000000000000000000 --- a/gpaw/scripts/affinity-wrapper.sh +++ /dev/null @@ -1,47 +0,0 @@ -#!/bin/bash -# An affinity wrapper for KNCs. -# As arguments, it expects first the number of processes per node -# followed by the command to run (and any possible arguments to it). - -# get some information about the job -ppn=$1 -shift -rank=$PMI_RANK -nmpi=$PMI_SIZE - -# echo "RANK", $PMI_RANK - -# number of devices in the system -ndev=2 - -# number of cores per device -nphcores=61 -nphcores=$((nphcores - 1)) - -# number of threads per physical core -tpc=4 - -# ranks per device -rpd=$((ppn / ndev)) -if [ "$rpd" == "0" ]; then - rpd=1 -fi - -# physical cores per device -ncores=$((nphcores / rpd)) - -# partition number of the current rank on its device -partition=$((rank % rpd)) - -# offset for the current rank -offset=$((ncores * partition)) - -# build core selection string -select="${ncores}c,${tpc}t,${offset}o" - -# fire up the actual run -log="affinity-`printf %03d $rank`.log" -rm -f $log -echo "host `hostname` rank `printf %03d $rank` - $select " |& tee -a $log -env | grep PYMIC |& tee -a $log -PYMIC_KMP_AFFINITY=compact,verbose PYMIC_KMP_PLACE_THREADS=$select $@ |& tee -a $log diff --git a/gpaw/scripts/job-BSC-MareNostrum4-skylake.slurm b/gpaw/scripts/job-BSC-MareNostrum4-skylake.slurm new file mode 100644 index 0000000000000000000000000000000000000000..435e5c595d8003c0d2ab5079e41b47f44cca322d --- /dev/null +++ b/gpaw/scripts/job-BSC-MareNostrum4-skylake.slurm @@ -0,0 +1,125 @@ +#! /bin/bash -l +#SBARCH -N 21 +#SBATCH -n 1000 -c 1 +#SBATCH --time 2:00:00 +#SBATCH -J GPAWbench +#SBATCH --qos prace +# + +inputfile=../input.py +benchmark_size='large' +csv_summary="BSC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}.csv" +module_20_1=GPAW-UEABS/20.1.0-Python38-FFTW-icc +module_20_10=GPAW-UEABS/20.10.0-Python39-FFTW-icc + +# The next one is not a system module but one that was used to do the settings +# for the project. +module load UEABS/2.2 + +srun_options='' + +echo -e "\nWorking in: $(pwd)\n" +echo -e "Modules loaded:\n" +module list +echo -e "Slurm environment:\n$(env | grep SLURM_)\n" +echo -e "\nJob script:\n" +cat $0 +echo -e "\n\n" + +# +# Check the results +# +function print_header { + + echo '"Module", "python/gpaw/ase/numpy/scipy", "tasks", "time", "iterations", "dipole", "fermi", "energy", "check", "Job ID"' >$1 + +} + +function print_results { + + output=$1 + summary=$2 + module=$3 + + . ../bounds.sh + + python_version=$(python -V | awk '{print $2}') + gpaw_version=$(srun -n 1 -c 1 python -c "import gpaw ; print( gpaw.__version__ )") + ase_version=$(python -c "import ase ; print( ase.__version__ )") + numpy_version=$(python -c "import numpy ; print( numpy.__version__ )") + scipy_version=$(python -c "import scipy ; print( scipy.__version__ )") + + # Extract some data to report form the output file. + bmtime=$(grep "Total:" $output | sed -e 's/Total: *//' | cut -d " " -f 1) + iterations=$(grep "Converged after" $output | cut -d " " -f 3) + dipole=$(grep "Dipole" $output | cut -d " " -f 5 | sed -e 's/)//') + fermi=$(grep "Fermi level:" $output | cut -d ":" -f 2 | sed -e 's/ //g') + energy=$(grep "Extrapolated: " $output | cut -d ":" -f 2 | sed -e 's/ //g') + # Check the bounds + if (( $(bc -l <<< "(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations)") == 1 )); then iterations_ok="OK"; else iterations_ok="not OK"; fi + if (( $(bc -l <<< "(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole)") == 1 )); then dipole_ok="OK"; else dipole_ok="not OK"; fi + if (( $(bc -l <<< "(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi)") == 1 )); then fermi_ok="OK"; else fermi_ok="not OK"; fi + if (( $(bc -l <<< "(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)") == 1 )); then energy_ok="OK"; else energy_ok="not OK"; fi + compare="" + compare+="(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations) && " + compare+="(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole) && " + compare+="(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi) && " + compare+="(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)" + if (( $(bc -l <<< "$compare") )); + then + bounds_check="OK"; + else + bounds_check="not OK" + fi + # Output to the slurm.out file + echo -e "\nResult information:\n" \ + " * Time: $bmtime s\n" \ + " * Number of iterations: $iterations (lower: $lower_iterations, upper: $upper_iterations, $iterations_ok)\n" \ + " * Dipole (3rd component): $dipole (lower: $lower_dipole, upper: $upper_dipole, $dipole_ok)\n" \ + " * Fermi level: $fermi (lower: $lower_fermi, upper: $upper_fermi, $fermi_ok)\n" \ + " * Extrapolated energy: $energy (lower: $lower_energy, upper: $upper_energy, $energy_ok)\n" \ + " * Boundary check: $bounds_check" + # Output to the summary spreadsheet + echo "\"$module\", \"$python_version/$gpaw_version/$ase_version/$numpy_version/$scipy_version\"," \ + "\"$SLURM_NTASKS\", \"$bmtime\", \"$iterations\", \"$dipole\", \"$fermi\", \"$energy\", \"$bounds_check\", \"$SLURM_JOB_ID\"" >> $summary + +} + +# +# Running with GPAW 20.1.0 +# + +print_header $csv_summary + +module load $module_20_1 + +echo -e "\n\nStarting GPAW for $module_20_1\n" + +srun $srun_options gpaw python $inputfile + +echo -e "\n\nGPAW terminated\n" + +print_results output.txt $csv_summary $module_20_1 + +mv output.txt BSC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.1.0.txt + +# +# Running with GPAW 20.10.0 +# + +module load $module_20_10 + +echo -e "\n\n" + +echo -e "\n\nStarting GPAW for $module_20_10\n" + +srun $srun_options gpaw python $inputfile + +echo -e "\n\nGPAW terminated\n" + +print_results output.txt $csv_summary $module_20_10 + +mv output.txt BSC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.10.0.txt + + + diff --git a/gpaw/scripts/job-HLRS-Hawk-rome.pbs b/gpaw/scripts/job-HLRS-Hawk-rome.pbs new file mode 100644 index 0000000000000000000000000000000000000000..2d1349a73cc7c89d2b66bc3d735704efc8b09d31 --- /dev/null +++ b/gpaw/scripts/job-HLRS-Hawk-rome.pbs @@ -0,0 +1,142 @@ +#! /bin/bash -l +#PBS -l select=8:node_type=rome:mpiprocs=128 +#PBS -l walltime=1:00:00 +#PBS -N GPPAWbench + +numranks=1000 +inputfile=../../input.py +benchmark_size='large' +csv_summary="HLRS_${benchmark_size}_${PBS_JOBID}_${numranks}.csv" +module_20_1=GPAW-UEABS/20.1.0-Python38-FFTW-icc +module_20_10=GPAW-UEABS/20.10.0-Python39-FFTW-icc + +cd $PBS_O_WORKDIR + +compiler_module='intel/19.1.3' +mpi_module='mpt/2.23' +math_module='mkl/19.1.0' + +bounds="$(dirname ${inputfile})/bounds.sh" + +# The first module is not a system module but one that was used to do the settings +# for the project. +module load UEABS +module load $compiler_module +module load $mpi_module +module load $math_module + +export MKL_DEBUG_CPU_TYPE=5 +export OMP_NUM_THREADS=1 + +echo -e "\nWorking in: $(pwd)\n" +echo -e "Modules loaded:\n" +module list +echo -e "PBS environment:\n$(env | grep PBS_)\n" +echo -e "\nJob script:\n" +cat $0 +echo -e "\n\n" + +# +# Check the results +# +function print_header { + + echo '"Module", "python/gpaw/ase/numpy/scipy", "tasks", "time", "iterations", "dipole", "fermi", "energy", "check", "Job ID"' >$1 + +} + +function print_results { + + output=$1 + summary=$2 + module=$3 + bounds=$4 + + #source ${bounds} + source ../../bounds.sh + + python_version=$(python -V | awk '{print $2}') + gpaw_version=$(python -c "import gpaw ; print( gpaw.__version__ )") + ase_version=$(python -c "import ase ; print( ase.__version__ )") + numpy_version=$(python -c "import numpy ; print( numpy.__version__ )") + scipy_version=$(python -c "import scipy ; print( scipy.__version__ )") + + # Extract some data to report form the output file. + bmtime=$(grep "Total:" $output | sed -e 's/Total: *//' | cut -d " " -f 1) + iterations=$(grep "Converged after" $output | cut -d " " -f 3) + dipole=$(grep "Dipole" $output | cut -d " " -f 5 | sed -e 's/)//') + fermi=$(grep "Fermi level:" $output | cut -d ":" -f 2 | sed -e 's/ //g') + energy=$(grep "Extrapolated: " $output | cut -d ":" -f 2 | sed -e 's/ //g') + # Check the bounds + if (( $(bc -l <<< "(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations)") == 1 )); then iterations_ok="OK"; else iterations_ok="not OK"; fi + if (( $(bc -l <<< "(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole)") == 1 )); then dipole_ok="OK"; else dipole_ok="not OK"; fi + if (( $(bc -l <<< "(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi)") == 1 )); then fermi_ok="OK"; else fermi_ok="not OK"; fi + if (( $(bc -l <<< "(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)") == 1 )); then energy_ok="OK"; else energy_ok="not OK"; fi + compare="" + compare+="(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations) && " + compare+="(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole) && " + compare+="(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi) && " + compare+="(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)" + if (( $(bc -l <<< "$compare") )); + then + bounds_check="OK"; + else + bounds_check="not OK" + fi + # Output to the PBS output file + echo -e "\nResult information:\n" \ + " * Time: $bmtime s\n" \ + " * Number of iterations: $iterations (lower: $lower_iterations, upper: $upper_iterations, $iterations_ok)\n" \ + " * Dipole (3rd component): $dipole (lower: $lower_dipole, upper: $upper_dipole, $dipole_ok)\n" \ + " * Fermi level: $fermi (lower: $lower_fermi, upper: $upper_fermi, $fermi_ok)\n" \ + " * Extrapolated energy: $energy (lower: $lower_energy, upper: $upper_energy, $energy_ok)\n" \ + " * Boundary check: $bounds_check" + # Output to the summary spreadsheet + echo "\"$module\", \"$python_version/$gpaw_version/$ase_version/$numpy_version/$scipy_version\"," \ + "\"$numranks\", \"$bmtime\", \"$iterations\", \"$dipole\", \"$fermi\", \"$energy\", \"$bounds_check\", \"$PBS_JOBID\"" >> $summary + +} + +# +# Running with GPAW 20.1.0 +# + +print_header $csv_summary + +module purge +module load UEABS/2.2 +module load $module_20_1 +echo -e "\n\nGPAW run with $module_20_1\nModules loaded:\n" +module list 2>&1 + +echo -e "\n\nStarting GPAW...\n" + +mpirun -n $numranks gpaw python $inputfile + +echo -e "\nGPAW ended, checking results...\n" + +print_results output.txt $csv_summary $module_20_1 + +mv output.txt HLRS_${benchmark_size}_${PBS_JOBID}_${numranks}_20.1.0.txt + +# +# Running with GPAW 20.10.0 +# + +module purge +module load UEABS/2.2 +module load $module_20_10 +echo -e "\n\nGPAW run with $module_20_10\nModules loaded:\n" +module list 2>&1 + +echo -e "\n\nStarting GPAW...\n" +echo -e "\n\n" + +mpirun -n $numranks gpaw python $inputfile $bounds + +echo -e "\nGPAW ended, checking results...\n" + +print_results output.txt $csv_summary $module_20_10 $bounds + +mv output.txt HLRS_${benchmark_size}_${PBS_JOBID}_${numranks}_20.10.0.txt + diff --git a/gpaw/scripts/job-JSC-JUWELS-skylake.slurm b/gpaw/scripts/job-JSC-JUWELS-skylake.slurm new file mode 100644 index 0000000000000000000000000000000000000000..bf58089c216b764176bf1d20501b43634e52c86f --- /dev/null +++ b/gpaw/scripts/job-JSC-JUWELS-skylake.slurm @@ -0,0 +1,127 @@ +#! /bin/bash -l +#SBATCH -A prpb101 +#SBARCH -N 21 +#SBATCH -n 1000 -c 1 +#SBATCH --time 1:00:00 +#SBATCH -J GPAWbench_large +#SBATCH -p batch +#SBATCH --hint=nomultithread +#SBATCH -o %x-%j.out +# + +inputfile=../input.py +benchmark_size='large' +csv_summary="JSC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}.csv" +module_20_1=GPAW-UEABS/20.1.0-Python38-FFTW-icc +module_20_10=GPAW-UEABS/20.10.0-Python39-FFTW-icc + +module load Intel/2021.2.0-GCC-10.3.0 +module load IntelMPI/2021.2.0 + +echo -e "\nWorking in: $(pwd)\n" +echo -e "Modules loaded:\n" +module list +echo -e "Slurm environment:\n$(env | grep SLURM_)\n" +echo -e "\nJob script:\n" +cat $0 +echo -e "\n\n" + + +# +# Check the results +# +function print_header { + + echo '"Module", "python/gpaw/ase/numpy/scipy", "tasks", "time", "iterations", "dipole", "fermi", "energy", "check", "Job ID"' >$1 + +} + +function print_results { + + output=$1 + summary=$2 + module=$3 + + . ../bounds.sh + + python_version=$(python -V | awk '{print $2}') + gpaw_version=$(srun -n 1 -c 1 python -c "import gpaw ; print( gpaw.__version__ )") + ase_version=$(python -c "import ase ; print( ase.__version__ )") + numpy_version=$(python -c "import numpy ; print( numpy.__version__ )") + scipy_version=$(python -c "import scipy ; print( scipy.__version__ )") + + # Extract some data to report form the output file. + bmtime=$(grep "Total:" $output | sed -e 's/Total: *//' | cut -d " " -f 1) + iterations=$(grep "Converged after" $output | cut -d " " -f 3) + dipole=$(grep "Dipole" $output | cut -d " " -f 5 | sed -e 's/)//') + fermi=$(grep "Fermi level:" $output | cut -d ":" -f 2 | sed -e 's/ //g') + energy=$(grep "Extrapolated: " $output | cut -d ":" -f 2 | sed -e 's/ //g') + # Check the bounds + if (( $(bc -l <<< "(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations)") == 1 )); then iterations_ok="OK"; else iterations_ok="not OK"; fi + if (( $(bc -l <<< "(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole)") == 1 )); then dipole_ok="OK"; else dipole_ok="not OK"; fi + if (( $(bc -l <<< "(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi)") == 1 )); then fermi_ok="OK"; else fermi_ok="not OK"; fi + if (( $(bc -l <<< "(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)") == 1 )); then energy_ok="OK"; else energy_ok="not OK"; fi + compare="" + compare+="(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations) && " + compare+="(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole) && " + compare+="(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi) && " + compare+="(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)" + if (( $(bc -l <<< "$compare") )); + then + bounds_check="OK"; + else + bounds_check="not OK" + fi + # Output to the slurm.out file + echo -e "\nResult information:\n" \ + " * Time: $bmtime s\n" \ + " * Number of iterations: $iterations (lower: $lower_iterations, upper: $upper_iterations, $iterations_ok)\n" \ + " * Dipole (3rd component): $dipole (lower: $lower_dipole, upper: $upper_dipole, $dipole_ok)\n" \ + " * Fermi level: $fermi (lower: $lower_fermi, upper: $upper_fermi, $fermi_ok)\n" \ + " * Extrapolated energy: $energy (lower: $lower_energy, upper: $upper_energy, $energy_ok)\n" \ + " * Boundary check: $bounds_check" + # Output to the summary spreadsheet + echo "\"$module\", \"$python_version/$gpaw_version/$ase_version/$numpy_version/$scipy_version\"," \ + "\"$SLURM_NTASKS\", \"$bmtime\", \"$iterations\", \"$dipole\", \"$fermi\", \"$energy\", \"$bounds_check\", \"$SLURM_JOB_ID\"" >> $summary + +} + +# +# Running with GPAW 20.1.0 +# + +print_header $csv_summary + +module purge +# UEABS/2.2 is not a system module but one of our own to basically do some settings +# needed for the project (and to adjust the MODULEPATH). It is not needed if +# $module_20_1 (generated by the build scripts) is in the MODULEPATH +module load UEABS/2.2 +module load $module_20_1 + +echo -e "\n\n" + +srun gpaw python $inputfile + +print_results output.txt $csv_summary $module_20_1 + +mv output.txt JSC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.1.0.txt + +# +# Running with GPAW 20.10.0 +# + +module purge +module load UEABS/2.2 +module load $module_20_10 + +echo -e "\n\n" + +srun gpaw python $inputfile + +print_results output.txt $csv_summary $module_20_10 + +mv output.txt JSC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.10.0.txt + + + diff --git a/gpaw/scripts/job-LRZ-SuperMUCng-skylake.slurm b/gpaw/scripts/job-LRZ-SuperMUCng-skylake.slurm new file mode 100644 index 0000000000000000000000000000000000000000..d1e284d9c7f29993b86e3f72f15a4b778beb94f5 --- /dev/null +++ b/gpaw/scripts/job-LRZ-SuperMUCng-skylake.slurm @@ -0,0 +1,143 @@ +#! /bin/bash -l +#SBATCH -A pn73ye +#SBARCH -N 21 +#SBATCH -n 1000 -c 1 +#SBATCH --time 40:00 +#SBATCH -J GPAWbench +#SBATCH -p general +#SBATCH --hint=nomultithread +#SBATCH -o %x-%j.out +# + +inputfile=../input.py +benchmark_size='large' +csv_summary="LRZ_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}.csv" +module_20_1=GPAW-UEABS/20.1.0-Python38-FFTW-icc +module_20_10=GPAW-UEABS/20.10.0-Python39-FFTW-icc + +module unload intel-mkl +module unload intel-mpi +module unload intel + +# UEABS/2.2 is not a system module but one of our own to basically do some settings +# needed for the project (and to adjust the MODULEPATH). It is not needed if +# $module_20_1 (generated by the build scripts) is in the MODULEPATH +module load UEABS/2.2 + +echo -e "\nWorking in: $(pwd)\n" +echo -e "Modules loaded:\n" +module list +echo -e "Slurm environment:\n$(env | grep SLURM_)\n" +echo -e "\nJob script:\n" +cat $0 +echo -e "\n\n" +echo -e "\nCalled script:\n" +cat ../../LRZ_run_GPAW.slurm +echo -e "\n\n" + +if (( $SLURM_NTASKS <= 512 )) +then + #srun --distribution=block:block /hppfs/work/pn73ye/di46ras/UEABS/Run/TEST-JOB/mpi_hello.exe + srun /hppfs/work/pn73ye/di46ras/UEABS/Run/TEST-JOB/mpi_hello.exe +fi + + +# +# Check the results +# +function print_header { + + echo '"Module", "python/gpaw/ase/numpy/scipy", "tasks", "time", "iterations", "dipole", "fermi", "energy", "check", "Job ID"' >$1 + +} + +function print_results { + + output=$1 + summary=$2 + module=$3 + + . ../bounds.sh + + python_version=$(python -V | awk '{print $2}') + gpaw_version=$(srun -n 1 -c 1 python -c "import gpaw ; print( gpaw.__version__ )") + ase_version=$(python -c "import ase ; print( ase.__version__ )") + numpy_version=$(python -c "import numpy ; print( numpy.__version__ )") + scipy_version=$(python -c "import scipy ; print( scipy.__version__ )") + + # Extract some data to report form the output file. + bmtime=$(grep "Total:" $output | sed -e 's/Total: *//' | cut -d " " -f 1) + iterations=$(grep "Converged after" $output | cut -d " " -f 3) + dipole=$(grep "Dipole" $output | cut -d " " -f 5 | sed -e 's/)//') + fermi=$(grep "Fermi level:" $output | cut -d ":" -f 2 | sed -e 's/ //g') + energy=$(grep "Extrapolated: " $output | cut -d ":" -f 2 | sed -e 's/ //g') + # Check the bounds + if (( $(bc -l <<< "(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations)") == 1 )); then iterations_ok="OK"; else iterations_ok="not OK"; fi + if (( $(bc -l <<< "(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole)") == 1 )); then dipole_ok="OK"; else dipole_ok="not OK"; fi + if (( $(bc -l <<< "(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi)") == 1 )); then fermi_ok="OK"; else fermi_ok="not OK"; fi + if (( $(bc -l <<< "(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)") == 1 )); then energy_ok="OK"; else energy_ok="not OK"; fi + compare="" + compare+="(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations) && " + compare+="(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole) && " + compare+="(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi) && " + compare+="(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)" + if (( $(bc -l <<< "$compare") )); + then + bounds_check="OK"; + else + bounds_check="not OK" + fi + # Output to the slurm.out file + echo -e "\nResult information:\n" \ + " * Time: $bmtime s\n" \ + " * Number of iterations: $iterations (lower: $lower_iterations, upper: $upper_iterations, $iterations_ok)\n" \ + " * Dipole (3rd component): $dipole (lower: $lower_dipole, upper: $upper_dipole, $dipole_ok)\n" \ + " * Fermi level: $fermi (lower: $lower_fermi, upper: $upper_fermi, $fermi_ok)\n" \ + " * Extrapolated energy: $energy (lower: $lower_energy, upper: $upper_energy, $energy_ok)\n" \ + " * Boundary check: $bounds_check" + # Output to the summary spreadsheet + echo "\"$module\", \"$python_version/$gpaw_version/$ase_version/$numpy_version/$scipy_version\"," \ + "\"$SLURM_NTASKS\", \"$bmtime\", \"$iterations\", \"$dipole\", \"$fermi\", \"$energy\", \"$bounds_check\", \"$SLURM_JOB_ID\"" >> $summary + +} + +# +# Running with GPAW 20.1.0 +# + +print_header $csv_summary + +module load $module_20_1 + +echo -e "\n\nStarting GPAW for $module_20_1\n" + +#srun --distribution=block:block gpaw python $inputfile +srun gpaw python $inputfile + +echo -e "\n\nGPAW terminated\n" + +print_results output.txt $csv_summary $module_20_1 + +mv output.txt LRZ_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.1.0.txt + +# +# Running with GPAW 20.10.0 +# + +module load $module_20_10 + +echo -e "\n\n" + +echo -e "\n\nStarting GPAW for $module_20_10\n" + +#srun --distribution=block:block gpaw python $inputfile +srun gpaw python $inputfile + +echo -e "\n\nGPAW terminated\n" + +print_results output.txt $csv_summary $module_20_10 + +mv output.txt LRZ_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.10.0.txt + + + diff --git a/gpaw/scripts/job-TGCC-irene-rome.slurm b/gpaw/scripts/job-TGCC-irene-rome.slurm new file mode 100644 index 0000000000000000000000000000000000000000..ede666a4995eef3991cddf9c93a098b52a5c80f8 --- /dev/null +++ b/gpaw/scripts/job-TGCC-irene-rome.slurm @@ -0,0 +1,130 @@ +#! /bin/bash -l +#MSUB -A pa5772 +#MSUB -q rome +#MSUB -Q normal +#MSUB -n 1024 +#MSUB -c 1 +#MSUB -T 5000 +#MSUB -r GPAWbench +# + +inputfile=../input.py +boundsfile=../bounds.sh +benchmark_size='large' +csv_summary="TGCC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}.csv" +module_20_1=GPAW-UEABS/20.1.0-Python38-FFTW-icc +module_20_10=GPAW-UEABS/20.10.0-Python39-FFTW-icc + +cd ${BRIDGE_MSUB_PWD} + +module purge +# UEABS/2.2 is not a system module but one of our own to basically do some settings +# needed for the project (and to adjust the MODULEPATH). It is not needed if +# $module_20_1 (generated by the build scripts) is in the MODULEPATH +module load UEABS/2.2 + +echo -e "\nWorking in: $(pwd)\n" +echo -e "Modules loaded:\n" +module list +echo -e "Slurm environment:\n$(env | grep SLURM_)\n" +echo -e "\nJob script:\n" +cat $0 +echo -e "\n\n" + +# +# Check the results +# +function print_header { + + echo '"Module", "python/gpaw/ase/numpy/scipy", "tasks", "time", "iterations", "dipole", "fermi", "energy", "check", "Job ID"' >$1 + +} + +function print_results { + + output="$1" + summary="$2" + module="$3" + bounds="$4" + + . $bounds + + python_version=$(python -V | awk '{print $2}') + gpaw_version=$(srun -n 1 -c 1 python -c "import gpaw ; print( gpaw.__version__ )") + ase_version=$(python -c "import ase ; print( ase.__version__ )") + numpy_version=$(python -c "import numpy ; print( numpy.__version__ )") + scipy_version=$(python -c "import scipy ; print( scipy.__version__ )") + + # Extract some data to report form the output file. + bmtime=$(grep "Total:" $output | sed -e 's/Total: *//' | cut -d " " -f 1) + iterations=$(grep "Converged after" $output | cut -d " " -f 3) + dipole=$(grep "Dipole" $output | cut -d " " -f 5 | sed -e 's/)//') + fermi=$(grep "Fermi level:" $output | cut -d ":" -f 2 | sed -e 's/ //g') + energy=$(grep "Extrapolated: " $output | cut -d ":" -f 2 | sed -e 's/ //g') + # Check the bounds + if (( $(bc -l <<< "(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations)") == 1 )); then iterations_ok="OK"; else iterations_ok="not OK"; fi + if (( $(bc -l <<< "(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole)") == 1 )); then dipole_ok="OK"; else dipole_ok="not OK"; fi + if (( $(bc -l <<< "(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi)") == 1 )); then fermi_ok="OK"; else fermi_ok="not OK"; fi + if (( $(bc -l <<< "(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)") == 1 )); then energy_ok="OK"; else energy_ok="not OK"; fi + compare="" + compare+="(($iterations-0) >= $lower_iterations) && (($iterations-0) <= $upper_iterations) && " + compare+="(($dipole-0) >= $lower_dipole) && (($dipole-0) <= $upper_dipole) && " + compare+="(($fermi-0) >= $lower_fermi) && (($fermi-0) <= $upper_fermi) && " + compare+="(($energy-0) >= $lower_energy) && (($energy-0) <= $upper_energy)" + if (( $(bc -l <<< "$compare") )); + then + bounds_check="OK"; + else + bounds_check="not OK" + fi + # Output to the slurm.out file + echo -e "\nResult information:\n" \ + " * Time: $bmtime s\n" \ + " * Number of iterations: $iterations (lower: $lower_iterations, upper: $upper_iterations, $iterations_ok)\n" \ + " * Dipole (3rd component): $dipole (lower: $lower_dipole, upper: $upper_dipole, $dipole_ok)\n" \ + " * Fermi level: $fermi (lower: $lower_fermi, upper: $upper_fermi, $fermi_ok)\n" \ + " * Extrapolated energy: $energy (lower: $lower_energy, upper: $upper_energy, $energy_ok)\n" \ + " * Boundary check: $bounds_check" + # Output to the summary spreadsheet + echo "\"$module\", \"$python_version/$gpaw_version/$ase_version/$numpy_version/$scipy_version\"," \ + "\"$SLURM_NTASKS\", \"$bmtime\", \"$iterations\", \"$dipole\", \"$fermi\", \"$energy\", \"$bounds_check\", \"$SLURM_JOB_ID\"" >> $summary + +} + +# +# Running with GPAW 20.1.0 +# + +print_header $csv_summary + +module load $module_20_1 + +echo -e "\n\nStarting GPAW for $module_20_1\n" + +ccc_mprun gpaw python $inputfile + +echo -e "\n\nGPAW terminated\n" + +print_results output.txt $csv_summary $module_20_1 $boundsfile + +mv output.txt TGCC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.1.0.txt + +# +# Running with GPAW 20.10.0 +# + +module load $module_20_10 + +echo -e "\n\n" + +echo -e "\n\nStarting GPAW for $module_20_10\n" + +ccc_mprun gpaw python $inputfile + +echo -e "\n\nGPAW terminated\n" + +print_results output.txt $csv_summary $module_20_10 $boundsfile + +mv output.txt TGCC_${benchmark_size}_${SLURM_JOB_ID}_${SLURM_NTASKS}_20.10.0.txt + + diff --git a/gpaw/scripts/job-davide.sh b/gpaw/scripts/job-davide.sh deleted file mode 100644 index 5d145983f9f78765fadc2e74076b732a053bd9b5..0000000000000000000000000000000000000000 --- a/gpaw/scripts/job-davide.sh +++ /dev/null @@ -1,17 +0,0 @@ -#!/bin/bash -#SBATCH -J 4x4 -#SBATCH -p dvd_usr_prod -#SBATCH --nodes=4 -#SBATCH --ntasks-per-node=4 -#SBATCH --cpus-per-task=4 -#SBATCH --time=00:30:00 -#SBATCH --account= -#SBATCH --gres=gpu:tesla:4 -#SBATCH --exclusive - -cd $SLURM_SUBMIT_DIR - -source $CINECA_SCRATCH/lib/gpaw-cuda/load.sh - -srun gpaw-python input.py - diff --git a/gpaw/scripts/job-juwels.sh b/gpaw/scripts/job-juwels.sh deleted file mode 100644 index ff42a63d7c3ea428f0400a745f6e5e91198ab713..0000000000000000000000000000000000000000 --- a/gpaw/scripts/job-juwels.sh +++ /dev/null @@ -1,12 +0,0 @@ -#!/bin/bash -x -#SBATCH -J 4x48 -#SBATCH --account= -#SBATCH --nodes=4 -#SBATCH --ntasks-per-node=48 -#SBATCH --time=00:30:00 -#SBATCH --partition=batch - -source $HOME/lib/gpaw-1.1.0/load.sh - -srun gpaw-python input.py - diff --git a/gpaw/scripts/job-marenostrum.sh b/gpaw/scripts/job-marenostrum.sh deleted file mode 100644 index 98303237358880fd9fa28bbbd16889b1136f63d8..0000000000000000000000000000000000000000 --- a/gpaw/scripts/job-marenostrum.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash -x -#SBATCH -J 4x48 -#SBATCH --nodes=4 -#SBATCH --ntasks-per-node=48 -#SBATCH --time=00:30:00 -#SBATCH --qos=prace - -source $HOME/project/lib/gpaw-1.1.0/load.sh - -srun gpaw-python input.py - diff --git a/gromacs/README.md b/gromacs/README.md index 73db922e03712fe45cf859dc8c5c7ec426b6eb4d..3089f4ec97cdb8edf9e9897d83c694a9dfca4079 100644 --- a/gromacs/README.md +++ b/gromacs/README.md @@ -127,23 +127,24 @@ shoud be allocated on the same socket. **A) `GluCl Ion Channel`** -The ion channel system is the membrane protein GluCl, which is a pentameric chloride channel embedded in a lipid bilayer. The GluCl ion channel was embedded in a DOPC membrane and solvated in TIP3P water. This system contains 142k atoms, and is a quite challenging parallelisation case due to the small size. However, it is likely one of the most wanted target sizes for biomolecular simulations due to the importance of these proteins for pharmaceutical applications. It is particularly challenging due to a highly inhomogeneous and anisotropic environment in the membrane, which poses hard challenges for load balancing with domain decomposition. +The ion channel system is the membrane protein GluCl, which is a pentameric chloride channel embedded in a lipid bilayer. The GluCl ion channel was embedded in a DOPC membrane and solvated in TIP3P water. This system contains 142k atoms, and is a quite challenging parallelisation case due to the small size. However, it is likely one of the most wanted target sizes for biomolecular simulations due to the importance of these proteins for pharmaceutical applications. It is particularly challenging due to a highly inhomogeneous and anisotropic environment in the membrane, which poses hard challenges for load balancing with domain decomposition. This test case was used as the “Small” test case in previous PRACE-2IP-5IP projects. It is reported to scale efficiently up to 300 - 1000 cores on recent x86 based systems. Download test Case A [https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseA.tar.xz](https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseA.tar.xz) + **B) `Lignocellulose`** A model of cellulose and lignocellulosic biomass in an aqueous solution [http://pubs.acs.org/doi/abs/10.1021/bm400442n](http://pubs.acs.org/doi/abs/10.1021/bm400442n). This system of 3.3 million atoms is inhomogeneous. -Reaction-field electrostatics are used instead of PME and therefore scales well. +Reaction-field electrostatics are used instead of PME and therefore scales well. This test case was used as the “Large” test case in previous PRACE-2IP-5IP projects. It is reported in previous PRACE projects to scale efficiently on 10000+ recent x86 cores. Download test Case B [https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseB.tar.xz](https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseB.tar.xz) -**C) `STMV 8M` ** +**C) `STMV 28M`** -This is a `2 x 2 x 2` replica of the STMV (Satellite Tobacco Mosaic Virus). It is a converted to GROMACS case of -the corresponding NAMD benchmark. It contains 8.5 million atoms and uses PME for electrostatics. +This is a `3 x 3 x 3` replica of the STMV (Satellite Tobacco Mosaic Virus). It is a converted to GROMACS case of +the corresponding NAMD benchmark. It contains 28 million atoms and uses PME for electrostatics. It is reported to scale efficiently on more than 10000 recent x86 cores. Download test Case C [https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseC.tar.xz](https://repository.prace-ri.eu/ueabs/GROMACS/2.2/GROMACS_TestCaseC.tar.xz) diff --git a/nemo/README.md b/nemo/README.md index 499e68131f0e1dc01ae6ea7863c62ad768018751..9d9bb326a04631d56f1a7fa2cb1123a036c22949 100644 --- a/nemo/README.md +++ b/nemo/README.md @@ -24,7 +24,7 @@ The model is implemented in Fortran 90, with pre-processing (C-pre-processor). I ``` 2. There are available known architectures which can be seen with the following command: ``` - ./make_xios –-avail + ./make_xios --avail ``` If target architecture is a known one, it can be built by the following command: @@ -35,7 +35,7 @@ The model is implemented in Fortran 90, with pre-processing (C-pre-processor). I ``` ./make_xios --arch local ``` - Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. + Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. These files should be used as starting point, that is updates might be required according to system upgrades etc. Note that XIOS requires `Netcdf4`. Please load the appropriate `HDF5` and `NetCDF4` modules. If path to these models are not loaded, you might have to change the path in the configuration file. @@ -44,7 +44,7 @@ Note that XIOS requires `Netcdf4`. Please load the appropriate `HDF5` and `NetCD ``` svn co https://forge.ipsl.jussieu.fr/nemo/svn/NEMO/releases/release-4.0 ``` -2. Copy and setup the appropriate architecture file in the arch folder. Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. The following changes are recommended for the GNU compilers: +2. Copy and setup the appropriate architecture file in the arch folder. Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. These files should be used as starting point, that is updates might be required according to system upgrades etc. The following changes are recommended for the GNU compilers: ``` a. add the `-lnetcdff` and `-lstdc++` flags to NetCDF flags b. using `mpif90` which is a MPI binding of `gfortran-4.9` diff --git a/nemo/architecture_files/NEMO/arch-M100.fcm b/nemo/architecture_files/NEMO/arch-M100.fcm index 063fb77b47a2d30540b5505c3b9388c2b90309d7..cef39235e5682cb4e290fa93e629edb1395a271b 100644 --- a/nemo/architecture_files/NEMO/arch-M100.fcm +++ b/nemo/architecture_files/NEMO/arch-M100.fcm @@ -31,12 +31,16 @@ # - fcm variables are starting with a % (and not a $) # -%NCDF_HOME /cineca/prod/opt/libraries/netcdf/4.7.3/gnu--8.4.0/ -%NCDF_HOME2 /cineca/prod/opt/libraries/netcdff/4.5.2/gnu--8.4.0/ -%HDF5_HOME /cineca/prod/opt/libraries/hdf5/1.12.0/gnu--8.4.0/ -%XIOS_HOME /m100_work/Ppp4x_5387/xios-2.5/ -%OASIS_HOME /not/defined +#%NCDF_HOME /cineca/prod/opt/libraries/netcdf/4.7.3--spectrum_mpi--10.4.0/hpc-sdk--2021--binary +#%NCDF_HOME2 /cineca/prod/opt/libraries/netcdff/4.5.2--spectrum_mpi--10.4.0/hpc-sdk--2021--binary +#%HDF5_HOME /cineca/prod/opt/libraries/hdf5/1.12.0--spectrum_mpi--10.3.1/pgi--19.10--binary + +%NCDF_HOME /cineca/prod/opt/libraries/netcdf/4.7.3/gnu--8.4.0 +%NCDF_HOME2 /cineca/prod/opt/libraries/netcdff/4.5.2/gnu--8.4.0 +%HDF5_HOME /cineca/prod/opt/libraries/hdf5/1.12.0/gnu--8.4.0 +%XIOS_HOME /m100/home/userexternal/mkarsavu/data/nemo_test/xios-2.5 +%OASIS_HOME /not/defined %HDF5_LIB -L%HDF5_HOME/lib -lhdf5_hl -lhdf5 %GCCLIB /cineca/prod/opt/compilers/gnu/8.4.0/none/lib64/ @@ -50,21 +54,22 @@ %OASIS_INC -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1 %OASIS_LIB -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip -%CPP cpp -Dkey_nosignedzero -%FC mpif90 -%FCFLAGS -fdefault-real-8 -fno-second-underscore -O3 -funroll-all-loops -fcray-pointer -cpp -ffree-line-length-none -Dgfortran +%CPP cpp +%FC mpif90 +%FCFLAGS -fdefault-real-8 -O3 -funroll-all-loops -fcray-pointer -cpp -ffree-line-length-none -fno-second-underscore -Dgfortran + %FFLAGS %FCFLAGS %LD %FC %LDFLAGS %FPPFLAGS -P -C -traditional -x f77-cpp-input %AR ar -%ARFLAGS rs -%MK make -%USER_INC %XIOS_INC %OASIS_INC %NCDF_INC -%USER_LIB %XIOS_LIB %OASIS_LIB %NCDF_LIB - +%ARFLAGS -rs +%MK make +%USER_INC %XIOS_INC %OASIS_INC %NCDF_INC +%USER_LIB %XIOS_LIB %OASIS_LIB %NCDF_LIB %CC cc %CFLAGS -O0 + diff --git a/nemo/architecture_files/XIOS/arch-M100.env b/nemo/architecture_files/XIOS/arch-M100.env index 9cc14ed42bad480a3710af4ba7c557c31c48e611..29557b31f3743fb3f5c0e537a22cec870e33cd03 100644 --- a/nemo/architecture_files/XIOS/arch-M100.env +++ b/nemo/architecture_files/XIOS/arch-M100.env @@ -1,7 +1,13 @@ module load gnu module load szip module load zlib -module load spectrum_mpi -module load hdf5 -module load netcdf -module load netcdff +module load spectrum_mpi/10.3.1--binary +module load hdf5/1.12.0--spectrum_mpi--10.3.1--binary +module load netcdf/4.7.3--spectrum_mpi--10.3.1--binary +module load netcdff/4.5.2--spectrum_mpi--10.3.1--binary +export NETCDF_INC_DIR=/cineca/prod/opt/libraries/netcdf/4.7.3--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/include/ +export NETCDFF_INC_DIR=/cineca/prod/opt/libraries/netcdff/4.5.2--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/include/ +export NETCDF_LIB_DIR=/cineca/prod/opt/libraries/netcdf/4.7.3--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/lib/ +export NETCDFF_LIB_DIR=/cineca/prod/opt/libraries/netcdff/4.5.2--spectrum_mpi--10.4.0/hpc-sdk--2021--binary/lib/ +export HDF5_INC_DIR=/cineca/prod/opt/libraries/hdf5/1.12.0--spectrum_mpi--10.3.1/pgi--19.10--binary/include/ +export HDF5_LIB_DIR=/cineca/prod/opt/libraries/hdf5/1.12.0--spectrum_mpi--10.3.1/pgi--19.10--binary/lib/ diff --git a/pfarm/PFARM_Build_Run_README.txt b/pfarm/PFARM_Build_Run_README.txt index 2dd9f876eb8f798a240c037bfec527ff84571aea..8b2863f3f9a933acb874ffbfa1fff98d788117b0 100644 --- a/pfarm/PFARM_Build_Run_README.txt +++ b/pfarm/PFARM_Build_Run_README.txt @@ -167,19 +167,30 @@ with the Wallclock time it takes to do each individual DSYEVD (eigensolver) call Performance is measured in Wallclock time and is displayed on the screen or output log at the end of the run. -For the atomic dataset, grep the output file for 'Sector 16:' -The output should match the values below. +** Validation of Results + +For the atomic dataset runs, from the results directory issue the command + +awk '/Sector 16/ && /eigenvalues/' + +replacing with the stdout file produced by the run. - Mesh 1, Sector 16: first five eigenvalues = -4329.72 -4170.91 -4157.31 -4100.98 -4082.11 - Mesh 1, Sector 16: final five eigenvalues = 4100.98 4157.31 4170.91 4329.72 4370.54 - Mesh 2, Sector 16: first five eigenvalues = -313.631 -301.010 -298.882 -293.393 -290.619 - Mesh 2, Sector 16: final five eigenvalues = 290.619 293.393 298.882 301.010 313.631 - -For the molecular dataset, `grep` the output file for `'Sector 64:'` The output should match the values below. - Mesh 1, Sector 64: first five eigenvalues = -3850.84 -3593.98 -3483.83 -3466.73 -3465.72 - Mesh 1, Sector 64: final five eigenvalues = 3465.72 3466.73 3483.83 3593.99 3850.84 +Mesh 1, Sector 16: first five eigenvalues = -4329.7161 -4170.9100 -4157.3112 -4100.9751 -4082.1108 + Mesh 1, Sector 16: final five eigenvalues = 4100.9751 4157.3114 4170.9125 4329.7178 4370.5405 + Mesh 2, Sector 16: first five eigenvalues = -313.6307 -301.0096 -298.8824 -293.3929 -290.6190 + Mesh 2, Sector 16: final five eigenvalues = 290.6190 293.3929 298.8824 301.0102 313.6307 + +For the molecular dataset runs, from the results directory issue the command + +awk '/Sector 64/ && /eigenvalues/' + +replacing with the stdout file produced by the run. + +The output should match the values below. +Mesh 1, Sector 64: first five eigenvalues = -3850.8443 -3593.9843 -3483.8338 -3466.7307 -3465.7194 + Mesh 1, Sector 64: final five eigenvalues = 3465.7194 3466.7307 3483.8338 3593.9855 3850.8443 ---------------------------------------------------------------------------- diff --git a/pfarm/cpu/bin/.gitkeep b/pfarm/cpu/bin/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/pfarm/cpu/src_mpi_omp_atom/amp.f90 b/pfarm/cpu/src_mpi_omp_atom/amp.f90 index 61eba43760e250df6884a3afd92b1ee4a9f6342b..eb428a308b6a1a82adca9df75ca1ea76ba8cdba9 100644 --- a/pfarm/cpu/src_mpi_omp_atom/amp.f90 +++ b/pfarm/cpu/src_mpi_omp_atom/amp.f90 @@ -15,7 +15,7 @@ module rw_amplitudes contains - subroutine start_sa_file (sqno, st, sect) + subroutine start_sa_file (sqno, st, sect, imesh) ! open XDR file to hold sector reduced width amplitudes use xdr_files, only: open_xdr, xdr_io use energy_grid, only: fine_mesh @@ -24,6 +24,7 @@ contains type(qno), intent(in) :: sqno ! scattering q#s integer, intent(in) :: st ! second target spin integer, intent(in) :: sect ! sector id + integer, intent(in) :: imesh ! mesh id character(len=11) :: hname character(len=2) :: sp, l character(len=1) :: p @@ -74,12 +75,13 @@ contains write (ixdro) nh, nsect end if if (xdr) then - write (fo,'(a)') 'XDR Representation Amplitude file: ' // & - &TRIM(hname) + write (fo,'(/,a,i0,a,i0,a,i0,a)') ' Mesh ', imesh, ', Sector ', sect, ': Opening & + &XDR Representation Amplitude file: ' // TRIM(hname) else - write (fo,'(a)') 'Native Representation Amplitude file: ' // & - &TRIM(hname) + write (fo,'(/,a,i0,a,i0,a,i0,a)') ' Mesh ', imesh, ', Sector ', sect, ': Opening & + &Native Representation Amplitude file: ' // TRIM(hname) end if + call flush(fo) end subroutine start_sa_file subroutine start_sa_file_id (sqno, st) diff --git a/pfarm/cpu/src_mpi_omp_atom/dist_mat.f90 b/pfarm/cpu/src_mpi_omp_atom/dist_mat.f90 index b2b2c27cd7383f5efdf4dcb60d745c81d0549f5e..cbe50073f5061fd86bf929140d3f07e1164e9305 100644 --- a/pfarm/cpu/src_mpi_omp_atom/dist_mat.f90 +++ b/pfarm/cpu/src_mpi_omp_atom/dist_mat.f90 @@ -2,7 +2,7 @@ module dist_mat ! Form diestributed sector Hamiltonian matrix and diagonalize ! Time-stamp: "2003-09-23 17:08:23 cjn" use io_units, only: fo - use precisn, only: wp + use precisn, only: wp, ip_long ! use blacs, only: ctxt, myrow, mycol, p, q, nblock, mynumrows,& ! mynumcols, p_error, dlen_, nb_, csrc_ use error_out, only: error_check @@ -75,8 +75,9 @@ contains end subroutine A_fill - subroutine sh_diag + subroutine sh_diag(imesh, sect) ! diagonalize the matrix a + integer, intent(in) :: imesh, sect real(wp), allocatable :: work(:) integer, allocatable :: iwork(:) integer :: lwork, liwork @@ -89,7 +90,7 @@ contains integer :: indxg2p, lwork1, ip, iq, imyrow, imycol integer :: rows, cols real(wp) :: t0, t1 - integer :: c0, c1, cr + integer(ip_long) :: c0, c1, cr real(wp), external :: dnrm2 ! initialize eigenvector distributed array, z. @@ -108,24 +109,32 @@ contains allocate (work(lwork), iwork(liwork), evals(ny), stat=status) call error_check (status, 'sh_diag: allocation error') - write (fo,'(/,a,/)') 'start of DSYEVD' + write (fo,'(/,a,i0,a,i0,a)') ' Mesh ',imesh,', Sector ',sect,': Starting DSYEVD ' + call flush(fo) + call cpu_time (t0) call system_clock (count=c0) call dsyevd (jobz, uplo, ny, a, ny, evals, work, lwork, iwork, liwork, info) - write (fo,'(/,a,/)') 'end of DSYEVD' + write (fo,'(/,a,i0,a,i0,a)') ' Mesh ',imesh,', Sector ',sect,': Ended DSYEVD ' call cpu_time (t1) - write (fo,'(a,f16.4,a)') 'DSYEVD CPU time = ', t1 - t0, ' secs' call system_clock (count=c1, count_rate=cr) - write (fo,'(a,f16.4,a)') 'DSYEVD Elapsed time = ', REAL(c1-c0,wp) / & - REAL(cr,wp), ' secs' + + write (fo,'(/,a,i0,a,i0,a,f14.3,a)') ' Mesh ',imesh,', Sector ',sect,& + ': DSYEVD CPU time = ', t1 - t0, ' secs' + write (fo,'(a,i0,a,i0,a,f14.3,a,/)') ' Mesh ',imesh,', Sector ',sect,& + ': DSYEVD Elapsed time = ', REAL(c1-c0,wp) / REAL(cr,wp), ' secs' call error_check (info, 'sh_diag: pdsyevd error') deallocate (work, iwork, stat=status) call error_check (status, 'sh_diag: deallocation error') + write (fo,'(/,a,i0,a,i0,a,5f14.4)') ' Mesh ',imesh,', Sector ',sect,': first five eigenvalues = ', evals(1:5) + write (fo,'(a,i0,a,i0,a,5f14.4,/)') ' Mesh ',imesh,', Sector ',sect,': final five eigenvalues = ', evals(ny-4:ny) + call flush(fo) + z = a evecs: do ii = 1, ny ! normalize eigenvectors ! indxg2p computes process coord which posseses entry of a diff --git a/pfarm/cpu/src_mpi_omp_atom/pdg_ctl.f90 b/pfarm/cpu/src_mpi_omp_atom/pdg_ctl.f90 index 273e67bf8c5a280590e321c9363baf431bc03b26..8852d3151e21c036653fe928be567eebadac87ac 100644 --- a/pfarm/cpu/src_mpi_omp_atom/pdg_ctl.f90 +++ b/pfarm/cpu/src_mpi_omp_atom/pdg_ctl.f90 @@ -61,7 +61,7 @@ contains nh = nc * nl if ((bug9 > 0).and.io_processor) then write (fo,'(a,i4)') 'Total Number of Sectors = ', nsect - write (fo,'(a,i6)') 'Partition Hamiltonian dimension = ', nh + write (fo,'(a,i4)') 'Partition Hamiltonian dimension = ', nh end if call flush(fo) @@ -93,23 +93,23 @@ contains if(numtasks.gt.1) then if (MOD(sect,numtasks)==MOD(mesh_taskid+1,numtasks)) then if(imesh==1) then - write(fo,'(a,2i4,a,i4)')'FINE Region MPI Task, Mesh Task ', taskid, mesh_taskid, ' calculating sector ', sect + write(fo,'(a,i0,a,i0,a,i0)')' FINE Region MPI Task ', taskid, ', Mesh Task ', mesh_taskid, ' calculating sector ', sect else - write(fo,'(a,2i4,a,i4)')'COARSE Region MPI Task, Mesh Task ', taskid, mesh_taskid, ' calculating sector ', sect + write(fo,'(a,i0,a,i0,a,i0)')' COARSE Region MPI Task ', taskid, ', Mesh Task ', mesh_taskid, ' calculating sector ', sect end if call flush(fo) else cycle end if end if - call start_sa_file (sqno, spins(1), sect) + call start_sa_file (sqno, spins(1), sect, imesh) ral = asect(sect) rar = asect(sect+1) r = 0.5_wp * ((rar - ral) * xi + rar + ral) call def_rb_vals (ral, rar) call potl (nc, nx, r) call A_fill (nh) ! form sector H - call sh_diag ! diagonalize sector H + call sh_diag (imesh, sect) ! diagonalize sector H call ampltd (ral, rar, nc, nl, nh) call kill_az call close_sa_file() diff --git a/pfarm/cpu/src_mpi_omp_atom/precisn.f90 b/pfarm/cpu/src_mpi_omp_atom/precisn.f90 index fbf7c279b0841750195486d0962e0c651dd331ea..8cba7004f6bb0cbb0c1e5b77dc4b540dca8e6b19 100644 --- a/pfarm/cpu/src_mpi_omp_atom/precisn.f90 +++ b/pfarm/cpu/src_mpi_omp_atom/precisn.f90 @@ -9,6 +9,7 @@ module precisn integer, parameter :: sp=selected_real_kind(6) integer, parameter :: wp=selected_real_kind(12) integer, parameter :: ep1=selected_real_kind(19) + integer, parameter :: ip_long = selected_int_kind(18) integer, parameter :: ep = MAX(ep1, wp) ! real(wp), parameter :: acc8 = epsilon(1.0_wp) ! real(wp), parameter :: acc16 = epsilon(1.0_ep) diff --git a/pfarm/cpu/src_mpi_omp_atom/rmx1.f90 b/pfarm/cpu/src_mpi_omp_atom/rmx1.f90 index 7c0f9e16c5304a8268a843cb1dc58be50f235131..6ada883a2a1abe18869ad57ce9f4ccf8d87b70c6 100644 --- a/pfarm/cpu/src_mpi_omp_atom/rmx1.f90 +++ b/pfarm/cpu/src_mpi_omp_atom/rmx1.f90 @@ -2,7 +2,7 @@ program rmx1 ! Electron-atom R-matrix external region calculation - STAGE 1 ! Time-stamp: "2003-09-23 17:06:43 cjn" ! modified for jpi_coupling april to dec 2006 vmb - use precisn, only: wp + use precisn, only: wp, ip_long use io_units, only: fo use slp, only: qno, def_qno use rmx1_in, only: targ_low_sce, targ_high_sce, rd_nmlst, split_prop,& @@ -34,7 +34,7 @@ program rmx1 integer :: nmesh, imesh integer :: imycol, imyrow, ip, iq integer :: BLACS_PNUM - integer :: c0, c1, cr + integer(ip_long) :: c0, c1, cr integer, allocatable :: map(:,:) integer ( kind = 4 ) :: errcode, ierr @@ -52,6 +52,7 @@ program rmx1 write (fo,'(15x,a,//)') '=============' write (fo,'(a)') 'R-Matrix External Region: Sector Diagonalization' write (fo,'(a,i6)') 'Number of MPI Tasks = ',numtasks + call flush(fo) end if call rd_nmlst ! read input namelist @@ -96,12 +97,17 @@ program rmx1 if (split_prop .and. st2 /= -999) then ! there is channel splitting call reorder_chls nc2 = nc - nc1 - if ((bug1 > 0).and.io_processor) then + if (io_processor) then write (fo,'(/a/)') 'Channel Splitting Used' write (fo,'(a,i6)') 'Number of spin 1 Channels = ', nc1 write (fo,'(a,i6/)') 'Number of Spin 2 channels = ', nc2 end if else ! no channel_splitting + if (io_processor) then + write (fo,'(/a/)') 'No Channel Splitting Specified' + write (fo,'(a,i6)') 'Number of Channels = ', nc + end if + nc1 = nc split_prop = .false. end if @@ -134,6 +140,7 @@ program rmx1 call diag (nc1, ncp, (/st1,st2/), sqno, imesh) if (nc1 /= nc) then ! initialize 2nd split partition data + if(io_processor) write(fo,'(a,2i0)') ' Initialize 2nd split partition data ' nc2 = nc - nc1 ncp = nc1 + 1 call diag (nc2, ncp, (/st2, -999/), sqno, imesh) @@ -154,4 +161,5 @@ program rmx1 REAL(cr,wp), ' secs' end if call MPI_FINALIZE(ierr) + end program rmx1 diff --git a/pfarm/cpu/src_mpi_omp_mol/amp.f90 b/pfarm/cpu/src_mpi_omp_mol/amp.f90 index 1e109e14ba86d6761c17df17cf84c2da012d130b..61b27d79bd8af05c496d5a264dcfff327ede9ffc 100644 --- a/pfarm/cpu/src_mpi_omp_mol/amp.f90 +++ b/pfarm/cpu/src_mpi_omp_mol/amp.f90 @@ -125,9 +125,6 @@ contains write (s_mpi, '(i2.2)') taskid hname = stem // sp // l // p // isect // s_mpi - write (fo,'(a,5i4)') ' INT sp,l,p,sect,taskid = ', qn(2),abs(qn(1)),qn(3),sect,taskid - write (fo,'(a,5a4)') ' CHAR sp,l,p,sect,taskid = ', sp,l,p,isect,s_mpi - if (xdr_amp_out) then ! open XDR output file for Hamiltonian ixdro = open_xdr (file=TRIM(hname), action='write') call xdr_io (ixdro, nh) diff --git a/pfarm/cpu/src_mpi_omp_mol/dist_mat.f90 b/pfarm/cpu/src_mpi_omp_mol/dist_mat.f90 index a33956c7c07b6f90eeab849a9936a609a95e9be1..8e4d7bc5854c7914c0f21e15bf70028f4abdeb19 100644 --- a/pfarm/cpu/src_mpi_omp_mol/dist_mat.f90 +++ b/pfarm/cpu/src_mpi_omp_mol/dist_mat.f90 @@ -2,7 +2,7 @@ module dist_mat ! Form diestributed sector Hamiltonian matrix and diagonalize ! Time-stamp: "2003-09-23 17:08:23 cjn" use io_units, only: fo - use precisn, only: wp + use precisn, only: wp, ip_long ! use blacs, only: ctxt, myrow, mycol, p, q, nblock, mynumrows,& ! mynumcols, p_error, dlen_, nb_, csrc_ use error_out, only: error_check @@ -75,8 +75,9 @@ contains end subroutine A_fill - subroutine sh_diag -! diagonalize the matrix a + subroutine sh_diag(imesh, sect) + ! diagonalize the matrix a + integer, intent(in) :: imesh, sect real(wp), allocatable :: work(:) integer, allocatable :: iwork(:) integer :: lwork, liwork @@ -89,7 +90,7 @@ contains integer :: indxg2p, lwork1, ip, iq, imyrow, imycol integer :: rows, cols real(wp) :: t0, t1 - integer :: c0, c1, cr + integer(ip_long) :: c0, c1, cr real(wp), external :: dnrm2 ! initialize eigenvector distributed array, z. @@ -108,28 +109,36 @@ contains allocate (work(lwork), iwork(liwork), evals(ny), stat=status) call error_check (status, 'sh_diag: allocation error') - write (fo,'(/,a,/)') 'start of DSYEVD' + write (fo,'(/,a,i0,a,i0,a)') ' Mesh ',imesh,', Sector ',sect,': Starting DSYEVD ' + call flush(fo) + call cpu_time (t0) call system_clock (count=c0) call dsyevd (jobz, uplo, ny, a, ny, evals, work, lwork, iwork, liwork, info) - write (fo,'(/,a,/)') 'end of DSYEVD' + write (fo,'(/,a,i0,a,i0,a)') ' Mesh ',imesh,', Sector ',sect,': Ended DSYEVD ' + call cpu_time (t1) - write (fo,'(a,f16.4,a)') 'DSYEVD CPU time = ', t1 - t0, ' secs' call system_clock (count=c1, count_rate=cr) - write (fo,'(a,f16.4,a)') 'DSYEVD Elapsed time = ', REAL(c1-c0,wp) / & - REAL(cr,wp), ' secs' + + write (fo,'(/,a,i0,a,i0,a,f14.3,a)') ' Mesh ',imesh,', Sector ',sect,& + ': DSYEVD CPU time = ', t1 - t0, ' secs' + write (fo,'(a,i0,a,i0,a,f14.3,a,/)') ' Mesh ',imesh,', Sector ',sect,& + ': DSYEVD Elapsed time = ', REAL(c1-c0,wp) / REAL(cr,wp), ' secs' call error_check (info, 'sh_diag: pdsyevd error') deallocate (work, iwork, stat=status) call error_check (status, 'sh_diag: deallocation error') - write (fo,'(/,a,4g12.4,/)') ' *** debug *** evals(1:4) = ', evals(1:4) - write (fo,'(/,a,4g12.4,/)') ' *** debug *** evals(n-3:n) = ', evals(ny-3:ny) + write (fo,'(/,a,i0,a,i0,a,5f14.4)') ' Mesh ',imesh,', Sector ',sect,': first five eigenvalues = ', evals(1:5) + write (fo,'(a,i0,a,i0,a,5f14.4,/)') ' Mesh ',imesh,', Sector ',sect,': final five eigenvalues = ', evals(ny-4:ny) + + call flush(fo) z = a + evecs: do ii = 1, ny ! normalize eigenvectors ! indxg2p computes process coord which posseses entry of a ! distributed matrix specified by a global index INDXGLOB. diff --git a/pfarm/cpu/src_mpi_omp_mol/pdg_ctl.f90 b/pfarm/cpu/src_mpi_omp_mol/pdg_ctl.f90 index c62638d434525d4a0e7d43de893da5e1a6258e74..edca65e8a8c2b39013cecfd461e56cc3d7bd39d4 100644 --- a/pfarm/cpu/src_mpi_omp_mol/pdg_ctl.f90 +++ b/pfarm/cpu/src_mpi_omp_mol/pdg_ctl.f90 @@ -52,22 +52,22 @@ contains call def_ncq (ncp) ! pass ethr offset in h_el ! generate asymptotic potential coefficients - if (inc_lrp_prop) then - if (.not. packed_cf) then - if (.not. jpi_coupling) then - call cfs (nc, tchl(ncp:), lchl(ncp:), sqno, spins) - else - call cfsj (nc, tchl(ncp:), lchl(ncp:), kschl(ncp:), sqno, spins) - end if - end if - end if + if (inc_lrp_prop) then + if (.not. packed_cf) then + if (.not. jpi_coupling) then + call cfs (nc, tchl(ncp:), lchl(ncp:), sqno, spins) + else + call cfsj (nc, tchl(ncp:), lchl(ncp:), kschl(ncp:), sqno, spins) + end if + end if + end if allocate (r(nx), stat=status) call error_check (status, 'pdiag: allocation r') nh = nc * nl - write (fo,'(a,i4)') 'Total Number of Sectors = ', nsect - write (fo,'(a,i6)') 'Partition Hamiltonian dimension = ', nh + write (fo,'(a,i0)') 'Total Number of Sectors = ', nsect + write (fo,'(a,i0)') 'Partition Hamiltonian dimension = ', nh ! Allow tasks to cycle through both meshes ensuring better load-balancing ! Ideally this would be achieved with a shared counter @@ -102,7 +102,7 @@ contains call def_rb_vals (ral, rar) call potl (nc, nx, r) call A_fill (nh) ! form distributed sector H - call sh_diag ! diagonalize sector H + call sh_diag (imesh, sect) ! diagonalize sector H call ampltd (ral, rar, nc, nl, nh) call kill_az call close_sa_file() @@ -111,7 +111,7 @@ contains call h_reset call reset_sec call reset_potl - if (inc_lrp_prop) call del_cf + if ((inc_lrp_prop).and.(.not. packed_cf)) call del_cf deallocate (r, stat=status) call error_check (status, 'pdiag: deallocation') diff --git a/pfarm/cpu/src_mpi_omp_mol/precisn.f90 b/pfarm/cpu/src_mpi_omp_mol/precisn.f90 index fbf7c279b0841750195486d0962e0c651dd331ea..8cba7004f6bb0cbb0c1e5b77dc4b540dca8e6b19 100644 --- a/pfarm/cpu/src_mpi_omp_mol/precisn.f90 +++ b/pfarm/cpu/src_mpi_omp_mol/precisn.f90 @@ -9,6 +9,7 @@ module precisn integer, parameter :: sp=selected_real_kind(6) integer, parameter :: wp=selected_real_kind(12) integer, parameter :: ep1=selected_real_kind(19) + integer, parameter :: ip_long = selected_int_kind(18) integer, parameter :: ep = MAX(ep1, wp) ! real(wp), parameter :: acc8 = epsilon(1.0_wp) ! real(wp), parameter :: acc16 = epsilon(1.0_ep) diff --git a/pfarm/cpu/src_mpi_omp_mol/rd_Hfile.f90 b/pfarm/cpu/src_mpi_omp_mol/rd_Hfile.f90 index f4dc437047a915423eb02add2a56b122fb71b647..d8d8389915f7d0b41e70f7dd121ca07adc3dfef8 100644 --- a/pfarm/cpu/src_mpi_omp_mol/rd_Hfile.f90 +++ b/pfarm/cpu/src_mpi_omp_mol/rd_Hfile.f90 @@ -75,7 +75,7 @@ contains subroutine readh1 ! Read part of asymptotic file that is independent of lrgl, spin, parity use xdr_files, only: open_xdr, xdr_io - use rmx1_in, only: filh, xdr_H_in, bug2, farm_format, molecule_format + use rmx1_in, only: filh, xdr_H_in, bug1, bug2, farm_format, molecule_format use scaling, only: set_charge, scale_radius, scale_etarg integer :: ihbuf(5) real(wp) :: rhbuf(2), etgr @@ -188,13 +188,17 @@ contains call set_charge (nz, nelc) call scale_radius (rmatr) call scale_etarg (etarg) ! etarg now in scaled Ryd - write (fo,'(/,a,/)') 'Target states' - write (fo,'(10x,a,5x,a,3x,a,8x,a,/,43x,a)') 'index', 'total l', & + if ((bug1 > 0).and.io_processor) then + write (fo,'(/,a,/)') 'Target states' + write (fo,'(10x,a,5x,a,3x,a,8x,a,/,43x,a)') 'index', 'total l', & '(2*s+1)', 'energy', 'scaled ryd' + end if e0 = etarg(1) do i = 1, ntarg etgr = (etarg(i) - e0) ! convert to Rydbergs - write (fo,'(3x,3i10,3x,f12.6)') i, ltarg(i), starg(i), etgr + if ((bug1 > 0).and.io_processor) then + write (fo,'(3x,3i10,3x,f12.6)') i, ltarg(i), starg(i), etgr + end if end do end subroutine readh1 diff --git a/pfarm/cpu/src_mpi_omp_mol/rmx1.f90 b/pfarm/cpu/src_mpi_omp_mol/rmx1.f90 index 35fb757ff3a640475805f4cb5a680b779b5c832c..58b22bd233f67c9d347f7d63e99809fa238e5028 100644 --- a/pfarm/cpu/src_mpi_omp_mol/rmx1.f90 +++ b/pfarm/cpu/src_mpi_omp_mol/rmx1.f90 @@ -2,7 +2,7 @@ program rmx1 ! Electron-atom R-matrix external region calculation - STAGE 1 ! Time-stamp: "2003-09-23 17:06:43 cjn" ! modified for jpi_coupling april to dec 2006 vmb - use precisn, only: wp + use precisn, only: wp, ip_long use io_units, only: fo use slp, only: qno, def_qno use rmx1_in, only: targ_low_sce, targ_high_sce, rd_nmlst, split_prop,& @@ -33,7 +33,7 @@ program rmx1 integer :: nmesh, imesh integer :: imycol, imyrow, ip, iq integer :: BLACS_PNUM - integer :: c0, c1, cr + integer(ip_long) :: c0, c1, cr integer, allocatable :: map(:,:) integer ( kind = 4 ) :: errcode, ierr diff --git a/pfarm/gpu/bin/.gitkeep b/pfarm/gpu/bin/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/pfarm/gpu/src_mpi_gpu_atom/dist_mat.f90 b/pfarm/gpu/src_mpi_gpu_atom/dist_mat.f90 index 74a8a2f3605fbf8b28c66146394162e294f69cf6..5d62a9aa51e0a3006de5521a1de89fc5789673b1 100644 --- a/pfarm/gpu/src_mpi_gpu_atom/dist_mat.f90 +++ b/pfarm/gpu/src_mpi_gpu_atom/dist_mat.f90 @@ -174,8 +174,8 @@ contains deallocate (work, iwork, stat=status) call error_check (status, 'sh_diag: deallocation error') - write (fo,'(/,a,i0,a,i0,a,5g14.6)') ' Mesh ',imesh,', Sector ',sect,': first five eigenvalues = ', evals(1:5) - write (fo,'(a,i0,a,i0,a,5g14.6,/)') ' Mesh ',imesh,', Sector ',sect,': final five eigenvalues = ', evals(ny-4:ny) + write (fo,'(/,a,i0,a,i0,a,5f14.4)') ' Mesh ',imesh,', Sector ',sect,': first five eigenvalues = ', evals(1:5) + write (fo,'(a,i0,a,i0,a,5f14.4,/)') ' Mesh ',imesh,', Sector ',sect,': final five eigenvalues = ', evals(ny-4:ny) z = a evecs: do ii = 1, ny ! normalize eigenvectors diff --git a/pfarm/gpu/src_mpi_gpu_mol/dist_mat.f90 b/pfarm/gpu/src_mpi_gpu_mol/dist_mat.f90 index 74a8a2f3605fbf8b28c66146394162e294f69cf6..5d62a9aa51e0a3006de5521a1de89fc5789673b1 100644 --- a/pfarm/gpu/src_mpi_gpu_mol/dist_mat.f90 +++ b/pfarm/gpu/src_mpi_gpu_mol/dist_mat.f90 @@ -174,8 +174,8 @@ contains deallocate (work, iwork, stat=status) call error_check (status, 'sh_diag: deallocation error') - write (fo,'(/,a,i0,a,i0,a,5g14.6)') ' Mesh ',imesh,', Sector ',sect,': first five eigenvalues = ', evals(1:5) - write (fo,'(a,i0,a,i0,a,5g14.6,/)') ' Mesh ',imesh,', Sector ',sect,': final five eigenvalues = ', evals(ny-4:ny) + write (fo,'(/,a,i0,a,i0,a,5f14.4)') ' Mesh ',imesh,', Sector ',sect,': first five eigenvalues = ', evals(1:5) + write (fo,'(a,i0,a,i0,a,5f14.4,/)') ' Mesh ',imesh,', Sector ',sect,': final five eigenvalues = ', evals(ny-4:ny) z = a evecs: do ii = 1, ny ! normalize eigenvectors diff --git a/qcd/part_1/src/random.c b/qcd/part_1/src/random.c index 734119e15c321e57c6dc9d29c892460aec7f143c..7fece63ac06dd5d9c72854fc9ffe4e138e94871b 100644 --- a/qcd/part_1/src/random.c +++ b/qcd/part_1/src/random.c @@ -4,6 +4,10 @@ /* 64 bit MT PRNG from mt19937-64.c */ #define NN 312 +#ifndef M_PI + #define M_PI 3.14159265358979323846 +#endif + extern int mti; extern unsigned long long mt[NN]; void init_genrand64(unsigned long long seed); diff --git a/qcd/part_2/README.md b/qcd/part_2/README.md index aa7c0842f7223f383136068a566760993b2ae31b..4e92de1288bdb5b0fbb6b43c8974b1275d4ce8cf 100644 --- a/qcd/part_2/README.md +++ b/qcd/part_2/README.md @@ -1,7 +1,7 @@ # README - QCD UEABS Part 2 -**2017 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)** +**2021 - Jacob Finkenrath - CaSToRC - The Cyprus Institute (j.finkenrath@cyi.ac.cy)** -Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ is developed in PRACE 4 IP under the task for developing an accelerator benchmark suite and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA +Part 2 of the QCD kernels of the Unified European Applications Benchmark Suite (UEABS) http://www.prace-ri.eu/ueabs/ was developed under the accelerator benchmark suite task within the 4th implementation phase of PRACE and is part of the UEABS kernel since PRACE 5IP under UEABS QCD part 2. Part 2 consists of two kernels, based on QUDA [^]: R. Babbich, M. Clark and B. Joo, “Parallelizing the QUDA Library for Multi-GPU Calculations @@ -9,7 +9,7 @@ and on the QPhiX library [^]: B. Joo, D. D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. W. Lee, P. Dubey, -. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). The QPhiX library consists of routines which are optimize to use Intel intrinsic functions of multiple vector length including AVX512, including optimized routines for KNC and KNL (http://jeffersonlab.github.io/qphix/). The benchmark kernel consists of the provided Conjugated Gradient benchmark functions of the libraries. +. The library QUDA is based on CUDA and optimize for running on NVIDIA GPUs (https://lattice.github.io/quda/). Currently a HIP and a generic version is under development, which can be used for AMD GPUs and if ready for CPU architectures, such as ARM. The generic QUDA kernel might replace the computational kernel of QPhiX in the future. The QPhiX library consists of routines developed for Intel Xeon Phi architecture and can perform on x86 architecture, such as Intel Xeon and AMD Epyc CPUs. QPhiX are optimize to use Intel intrinsic functions of multiple vector length including AVX512 (http://jeffersonlab.github.io/qphix/). In general the benchmark kernels are applying the Conjugated Gradient solver to the Wilson Dirac operator, a 4-dimension stencil. ## Table of Contents @@ -21,45 +21,29 @@ and on the QPhiX library #### 1.1 Compile -Download Cmake and Quda +Clone `quda` via -General information how to build QUDA with cmake can be found under: -"https://github.com/lattice/quda/wiki/Building-QUDA-with-cmake" -Here we just give a short overview: - -Build Cmake: (./QCD_Accelerator_Benchmarksuite_Part2/GPUs/src/cmake-3.7.0.tar.gz) - -Cmake can be downloaded from the source with the URL: https://cmake.org/download/ -In this guide the version cmake-3.7.0 is used. The build instruction can be found in the main directory under README.rst . Use the configure file `./configure` . -Then run - -gmake`. - -Build Quda: (./QCD_Accelerator_Benchmarksuite_Part2/GPUs/src/quda.tar.gz) - -Download quda for example by using `git clone https://github.com/lattice/quda.git`. -Create a build-folder. Execute the executable `cmake` in the build-folder which -is located in the cmake/bin. -Execute: - -``` shell -./$PATH2CMAKE/cmake $PATH2QUDA -DQUDA_GPU_ARCH=sm_XX -DQUDA_DIRAC_WILSON=ON -DQUDA_DIRAC_TWISTED_MASS=OFF --DQUDA_DIRACR_DOMAIN_WALL=OFF -DQUDA_HISQ_LINK=OFF -DQUDA_GAUGE_FORCE=OFF -DQUDA_HISQ_FORCE=OFF -DQUDA_MPI=ON +```shell +git clone https://github.com/lattice/quda.git ``` -with +and build `quda` via -``` shell - PATH2CMAKE= path to the cmake-executable - PAT2QUDA= path to the home dir of QUDA +``` +cd quda +mkdir build +cd build +cmake -DQUDA_DIRAC_WILSON=ON -DQUDA_DIRAC_TWISTED_MASS=OFF -DQUDA_DIRAC_WILSON=ON -DQUDA_DIRAC_TWISTED_MASS=OFF -DQUDA_DIRACR_DOMAIN_WALL=OFF -DQUDA_HISQ_LINK=OFF -DQUDA_GAUGE_FORCE=OFF -DQUDA_HISQ_FORCE=OFF -DQUDA_MPI=ON -DQUDA_GPU_ARCH=sm_XX.. ``` -Set `-DQUDA_GPU_ARCH=sm_XX` to the GPU Architecture (`sm_60` for Pascals, `sm_35` for Keplers) +Set `-DQUDA_GPU_ARCH=sm_XX` to the GPU Architecture (`sm_80` for Nvidia A100,`sm_70` for Nvidia V100, `sm_60` for Pascals, `sm_35` for Keplers, etc.) -If Cmake or the compilation fails library paths and options can be set by the cmake provided function "ccmake". -Use `./PATH2CMAKE/ccmake PATH2BUILD_DIR` to edit and to see the availble options. +If cmake or the compilation fails, library paths and options can be set by the cmake provided function "ccmake". +Use `./PATH2CMAKE/ccmake PATH2BUILD_DIR` to edit and to see the available options. Cmake generates the Makefiles. Run them by use `make`. -Now in the folder /test one can find the needed Quda executable "invert_". +Now in the folder /test one can find the needed ` executable "invert_". + +Note that currently (status 07/21) the master branch of QUDA fails to download `Eigen`, due to updated hash of the `Eigen` version. To circumvent this, `Eigen` can be provided external by switching off the automatic download of `Eigen` via `-DQUDA_DOWNLOAD_EIGEN=OFF ` and provide the path external path via `-DEIGEN_INCLUDE_DIR=$PATH_TO_EIGEN`. Alternatively is it possible to update the hash line within the CMake file of QUDA. It is expected that this issue is solved in the near future with a new release of QUDA. #### 1.2 Run @@ -216,37 +200,35 @@ GPUs GFLOPS sec ## Xeon(Phi) Kernel ### 2. Compile and Run the Xeon(Phi)-Part -Unpack the provided source tar-file located in `./QCD_Accelerator_Benchmarksuite_Part2/XeonPhi/src` or -clone the actual git-hub branches of the code -packages QMP: +QPhiX currently requires additional third party libraries, like the USQCD-libraries `qmp`,`qdpxx`,`qio`,`xpath_reader`,`c-lime` and the xml library `libxml2`. The USQCD-libs can be found under -``` shell -git clone https://github.com/usqcd-software/qmp +```shell +https://github.com/usqcd-software ``` -and for QPhix + `libxml2` is available under + +```shell +http://xmlsoft.org/ +``` + +while the repository of QPhiX is hosted under Jefferson Lab github account, see ``` shell git clone https://github.com/JeffersonLab/qphix ``` -Note that for running on Skylake chips it is recommended to utilize -the branch develop of QPhix which needs additional packages -like qdp++ (Status 04/2019). - #### 2.1 Compile -The QPhix library is based on QMP communication functions. -For that QMP has to be setup first. +The QPhiX library is based on QMP communication functions which need to be provided. Note that you might have to reconfigure the configure-file using `autoreconf` from `Autotool`. QMP can be configured using: ``` shell -./configure --prefix=$QMP_INSTALL_DIR CC=mpiicc CFLAGS=" -mmic/-xAVX512 -std=c99" --with-qmp-comms-type=MPI --host=x86_64-linux-gnu --build=none-none-none +./configure --prefix=$QMP_INSTALL_DIR CC=mpiicc CFLAGS=" -xHOST -std=c99" --with-qmp-comms-type=MPI --host=x86_64-linux-gnu --build=none-none-none ``` -Create the Install folder and link with `$QMP_INSTALL_DIR` to it. -Use the compilerflag `-mmic` for the compilation for KNC's -while use `-xAVX512` for the compilation for KNL's. +Create the install folder and link with `$QMP_INSTALL_DIR` to it. Then use + ``` shell make make install @@ -254,47 +236,136 @@ make install to compile and setup the necessary source files in `$QMP_INSTALL_DIR`. -The QPhix executable can be compiled by using: -for KNC's +For the current master branch of `QPhiX` it is required to provide the package `qdp++`, which has sub-dependencies given by `qio`,`xpath_reader`,`c-lime` and `libxml2`. QDP++ can be configure using (here for Skylake chip) ``` shell -./configure --enable-parallel-arch=parscalar --enable-proc=MIC --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-openmp -mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -g -O2 -finline-functions -fno-alias -std=c++0x" CFLAGS="-mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -openmp -g -O2 -fno-alias -std=c9l9" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR +./configure --with-qmp=$QMP_INSTALL_DIR --enable-parallel-arch=parscalar CC=mpiicc CFLAGS="-xCORE-AVX512 -mtune=skylake-avx512 -std=c99" CXX=mpiicpc CXXFLAGS="-axCORE-AVX512 -mtune=skylake-avx512 -std=c++14 -qopenmp" --enable-openmp --host=x86_64-linux-gnu --build=none-none-none --prefix=$QDPXX_INSTALL_DIR l --disable-filedb ``` -or for KNL's +The configure is searching for additional USQCD libraries in the subfolder `./other_libs`. Clone the required library files, like -``` shell -./configure --enable-parallel-arch=parscalar --enable-proc=AVX512 --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-qopenmp -xMIC-AVX512 -g -O3 -std=c++14" CFLAGS="-xMIC-AVX512 -qopenmp -O3 -std=c99" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR +```shell +git clone https://github.com/usqcd-software/qio.git ``` -by using the previous variable `QMP_INSTALL_DIR` which links to the install-folder -of QMP. The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`. +and reconfigure. Note that on system like JSC's JUWELS `libxml2` has to be additional compiled and the path can be added to the configuration of `qdpxx` via + +```shell +--with-libxml2=$LIBXML2_INSTALL_DIR +``` +Now QPhiX benchmark kernels can be compiled via `cmake`. Create a build folder and -Note for the develop branch the package qdp++ has to be compiled. -QDP++ can be configure using (here for skylake chip) ``` shell -./configure --with-qmp=$QMP_INSTALL_DIR --enable-parallel-arch=parscalar CC=mpiicc CFLAGS="-xCORE-AVX512 -mtune=skylake-avx512 -std=c99" CXX=mpiicpc CXXFLAGS="-axCORE-AVX512 -mtune=skylake-avx512 -std=c++14 -qopenmp" --enable-openmp --host=x86_64-linux-gnu --build=none-none-none --prefix=$QDPXX_INSTALL_DIR +mkdir build +cd build +cmake -DQDPXX_DIR=$QDP_INSTALL_DIR -DQMP_DIR=$QMP_INSTALL_DIR -Disa=avx512 -Dparallel_arch=parscalar -Dhost_cxx=mpiicpc -Dhost_cxxflags="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -Dtm_clover=ON -Dtwisted_mass=ON -Dtesting=ON -DCMAKE_CXX_COMPILER=mpiicpc -DCMAKE_CXX_FLAGS="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -DCMAKE_C_COMPILER=mpiicc -DCMAKE_C_FLAGS="-std=c99 -O3 -axCORE-AVX512 -mtune=skylake-avx512" .. ``` -Now QPhix executable can be compiled by using: +The executable `time_clov_noqdp` can be found now in the sub-folder `tests`. Note that in the current version compilation of other test kernels can fail. If that is the case directly compile the needed executable, via + +``` +cd tests +make time_clov_noqdp +``` +The QPhiX is developed to utilize the computational potential of Intels Xeon Phi architecture, which is discontinued. Earlier versions of QPhiX for the Intel Xeon Phi architecture can be compiled without `qdpxx` using configure build. Namely for KNC's ``` shell -cmake -DQDPXX_DIR=$QDP_INSTALL_DIR -DQMP_DIR=$QMP_INSTALL_DIR -Disa=avx512 -Dparallel_arch=parscalar -Dhost_cxx=mpiicpc -Dhost_cxxflags="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -Dtm_clover=ON -Dtwisted_mass=ON -Dtesting=ON -DCMAKE_CXX_COMPILER=mpiicpc -DCMAKE_CXX_FLAGS="-std=c++17 -O3 -axCORE-AVX512 -mtune=skylake-avx512" -DCMAKE_C_COMPILER=mpiicc -DCMAKE_C_FLAGS="-std=c99 -O3 -axCORE-AVX512 -mtune=skylake-avx512" .. +./configure --enable-parallel-arch=parscalar --enable-proc=MIC --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-openmp -mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -g -O2 -finline-functions -fno-alias -std=c++0x" CFLAGS="-mmic -vec-report -restrict -mGLOB_default_function_attrs=\"use_gather_scatter_hint=off\" -openmp -g -O2 -fno-alias -std=c9l9" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR +``` + +or for KNL's + +``` shell +./configure --enable-parallel-arch=parscalar --enable-proc=AVX512 --enable-soalen=8 --enable-clover --enable-openmp --enable-cean --enable-mm-malloc CXXFLAGS="-qopenmp -xMIC-AVX512 -g -O3 -std=c++14" CFLAGS="-xMIC-AVX512 -qopenmp -O3 -std=c99" CXX=mpiicpc CC=mpiicc --host=x86_64-linux-gnu --build=none-none-none --with-qmp=$QMP_INSTALL_DIR ``` -The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`. +by using the previous variable `QMP_INSTALL_DIR` which links to the install-folder +of QMP. The executable `time_clov_noqdp` can be found now in the subfolder `./qphix/test`. ##### 2.1.1 Example compilation on PRACE machines In the subsection we provide some example compilation on PRACE machines which where used to develop the QCD Benchmarksuite 2. -###### 2.1.1.1 BSC - Marenostrum III Hybrid partitions +###### 2.1.1.1 JSC - JUWELS + +JUWELS (Cluster Module) at Juelich Supercomputing Center is equipped with Intel Skylake chips, namely 2× Intel Xeon Platinum 8168 CPU, 2× 24 cores, 2.7 GHz per compute node. The compilation was done using the following software setup (status 07/21) + +```shell +ml Intel/2020.2.254-GCC-9.3.0 IntelMPI/2019.8.254 imkl/2020.4.304 Autotools CMake +``` -The Hybrid partition on Marenostrum are equiped with KNC's. +(`ml` is a short cut for `module load`). + +`qmp` was build via + +``` +git clone https://github.com/usqcd-software/qmp.git +cd qmp +autoreconf +./configure --prefix=${PWD}/install CC=mpiicc CFLAGS="-xHOST -std=c99 -O3" --with-qmp-comms-type=MPI --host=x86_64-linux-gnu +make +make install +``` + +`qdpxx` requires `libxml2` which was build via + +```shell +git clone https://gitlab.gnome.org/GNOME/libxml2.git +cd libxml2 +./autogen.sh +./configure --prefix=${PWD}/install +make +make install +``` + +For `qdpxx` the additional required libs were added to the sub-folder `other-libs`, by + +```shell +git clone https://github.com/usqcd-software/qdpxx.git +cd qdpxx/other_libs +git clone https://github.com/usqcd-software/xpath_reader.git +cd xpath_reader/ +autoreconf +cd .. +git clone https://github.com/usqcd-software/qio.git +cd qio +autoreconf +cd /other_libs/ +git clone https://github.com/usqcd-software/c-lime.git +# note that the path of c-lime is ./qdpxx/other_libs/qio/other_libs/c-lime +cd c-lime/ +autoreconf +``` + +Now, `qpdxx` was compiled via + +```shell +./configure --with-libxml2=${PWD}/../libxml2/install --with-qmp=/p/project/cecy00/ecy00a/src_bench/qdpxx/../qmp/install --enable-openmp --enable-parallel-arch=parscalar CC=mpiicc CFLAGS="-xHOST -std=c99 -qopenmp" CXX=mpiicpc CXXFLAGS="-xHOST -std=c++11 -qopenmp" --prefix=/p/project/cecy00/ecy00a/src_bench/qdpxx/install --with-libxml2=${PWD}/../libxml2/install --disable-filedb +make +make install +``` + +Note `configure` might require `xml2-config` for which the path can be set via `export PATH+=:$PATH_2_LIBXML2` + +Finally the computational kernels of `QPhiX` can be build via + +```shell +git clone https://github.com/JeffersonLab/qphix.git +cd qphix/ +mkdir build +cd build +cmake -DQDPXX_DIR=${PWD}/../../qdpxx/install -DQMP_DIR=${PWD}/../../qmp/install -Disa=avx512 -Dparallel_arch=parscalar -Dhost_cxx=icpc -Dhost_cxxflags="-std=c++17 -O3 -xHOST" -Dtm_clover=ON -Dtwisted_mass=ON -Dtesting=ON -DCMAKE_CXX_COMPILER=mpiicpc -DCMAKE_CXX_FLAGS="-std=c++17 -O3 -xHOST" -DCMAKE_C_COMPILER=mpiicc -DCMAKE_C_FLAGS="-std=c99 -O3 -xHOST" -DCMAKE_INSTALL_PREFIX=${PWD}/../install .. +cd tests +make time_clov_noqdp +``` + +###### 2.1.1.2 BSC - Marenostrum III Hybrid partitions + +The Hybrid partition on Marenostrum are equiped with KNC's (status 2016). First following modules were loaded ``` shell @@ -326,9 +397,9 @@ Now the package QPhix is compilled with make ``` -###### 2.1.1.2 CINES - Frioul +###### 2.1.1.3 CINES - Frioul -On a test cluster at the CINES-side the Benchmarksuite was tested on KNL's. +On a test cluster at the CINES-side the Benchmarksuite was tested on KNL's (status 2018). The steps are similar to BSC. First the libraries paths are set with ``` shell @@ -383,17 +454,51 @@ of the target machine. #### 2.3 Example Benchmark Results -The benchmark results for the XeonPhi benchmark suite are performed on -Frioul, a test cluster at CINES, and the hybrid partion on MareNostrum III at BSC. -Frioul has one KNL-card per node while the hybrid partion of MareNostrum III is -equiped with two KNCs per node. The data on Frioul are generated by using -the bash-scripts provided by the QCD-Accelerator Benchmarksute Part 2 -and are done for the two test cases "Strong-Scaling" with a lattice size -of 32^3x96 and "Weak-scaling" with a local lattice size of 48^3x24 per -card. In case of the data generated at MareNostrum, data for the "Strong-Scaling" -mode on a 32^3x96 lattice are shown. The Benchmark is using a random gauge configuration and uses the +Here, we show some benchmark results for the Xeon(Phi) benchmark suite, which were performed on +JUWELS Cluster Moduel at JSC, on Frioul, a test cluster at CINES, and the hybrid partion on MareNostrum III at BSC. +Frioul has one KNL-card per node while the hybrid partion of MareNostrum III is equiped with two KNCs per node. JUWELS Cluster module is equipped with 2× Intel Xeon Platinum 8168 CPU, generation Intel Xeon Skylake, per Node. The data shows "Strong-Scaling" with a lattice size +of 32^3x96 for all three machines. The benchmark is using a random gauge configuration and uses the Conjugated Gradient solver to solve a linear equation involving the clover Wilson Dirac operator. +``` +--------------------- + JUWELS Cluster Module +--------------------- +Strong - Scaling: +global lattice size (32x32x32x96) +Per node 8 MPI tasks with 6 openMP threads each + +precision: single + +Nodes time to solution GFLOPS +1 6,38306 276,41 +2 3,45452 514,21 +4 1,91458 915,26 +8 1,02946 1725,51 +16 0,790274 2217,38 +32 0,338817 5313,64 +64 0,271364 7121,05 +128 0,261831 7426,16 +256 0,355913 6103,95 +512 0,532555 4079,35 + +precision: double + +Nodes time to solution GFLOPS +1 22,8705 132,26 +2 12,3162 245,60 +4 6,63012 456,23 +8 3,49709 864,96 +16 1,77833 1700,95 +32 0,94527 3199,98 +64 0,585362 5167,48 +128 0,379343 7973,9 +256 0,966274 3130,42 +512 0,884134 3421,25 +``` + + + ``` --------------------- Frioul - KNLs @@ -417,27 +522,6 @@ KNLs GFLOPS 2 616.467 4 1047.79 8 1616.37 - -Weak - Scaling: -local lattice size (48x48x48x24) - -precision: single - -KNLs GFLOPS -1 348.304 -2 616.697 -4 1214.82 -8 2425.45 -16 4404.63 - -precision: double - -KNLs GFLOPS - 1 172.303 - 2 320.761 - 4 629.79 - 8 1228.77 -16 2310.63 ``` ``` @@ -465,4 +549,4 @@ KNCs GFLOPS 16 368.196 32 605.882 64 847.566 -``` +``` \ No newline at end of file diff --git a/specfem3d/README.md b/specfem3d/README.md index 70681a01616866ae4ac98467f97f51afdfe0f6e1..d65f4fc087fa643bac73eaf6d1ed51d0261a15ed 100644 --- a/specfem3d/README.md +++ b/specfem3d/README.md @@ -2,7 +2,7 @@ ## Summary Version -1.1 +1.2 ## General description The software package SPECFEM3D simulates three-dimensional global and regional seismic wave propagation based upon the spectral-element method (SEM). All SPECFEM3D_GLOBE software is written in Fortran90 with full portability in mind, and conforms strictly to the Fortran95 standard. It uses no obsolete or obsolescent features of Fortran77. The package uses parallel programming based upon the Message Passing Interface (MPI). @@ -23,8 +23,7 @@ In many geological models in the context of seismic wave propagation studies (ex ## Purpose of Benchmark The software package SPECFEM3D_GLOBE simulates three-dimensional global and regional seismic wave propagation and performs full waveform imaging (FWI) or adjoint tomography based upon the spectral-element method (SEM). The test cases simulate the earthquake of June 1994 in Northern Bolivia at a global scale with the global shear-wave speed model named s362ani. -Test Case A is designed to run on Tier-1 sized systems (up to around 1,000 x86 cores, or equivalent), Test Case B is designed to run on Tier-0 sized systems (up to around 10,000 x86 cores, or equivalent) and -finally a small validation test case called "small_benchmark_run_to_test_more_complex_Earth" which is a native specfem3D_globe benchmark to validate the behavior of code designed to run on a 24 MPI processes(1 node)". +Test Case A is designed to run on a system that has up to about 1,000 x86 cores, or equivalent, Test Case B is designed to run on systems up to about 10,000 x86 cores, or equivalent and finally a small validation test case called "small_benchmark_run_to_test_more_complex_Earth" which is a native specfem3D_globe benchmark to validate the behavior of code designed to run on a 24 MPI processes(1 node)". ## Mechanics of Building Benchmark @@ -171,7 +170,7 @@ You can use or be inspired by the submission script template in the job_script f ## Validation -To validate the quality of the compilation and the functioning of the code on your machine, we will use the small validation test case (warning, **Pyhton and numpy are required**). +To validate the quality of the compilation and the functioning of the code on your machine, we will use the small validation test case (warning, **Pyhton 2.7 and numpy are required**). You must first run the small validation test case and have results in the "OUTPUT_FILES" folder (i.e. the 387 seismograms produced). ```shell ls specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/*.ascii | wc -l @@ -208,4 +207,15 @@ Using slurm, it is easy to gather as each `mpirun` or `srun` is interpreted as timed. So the command line `sacct -j ` allows you to catch the metric. The output of the mesher (“output_mesher.txt”) and of the solver (“output_solver.txt”) can be find in the OUTPUT_FILES directory. These files contains physical values -and timing values that are more accurate than those collected by slurm. \ No newline at end of file +and timing values that are more accurate than those collected by slurm. + + +## Reference results +Below are some benchmark results from the PRACE project implementation phase 5, which are displayed here to give an idea of performance on different systems. + +|Application|Problem |size: number|size:unit|erformance: metric|Performance: unit|Hazel Hen|Irene - KNL|Irene - SKL|JUWELS|Marconi - KNL|MareNostrum 4|Piz Daint - P100|DAVIDE|Frioul |SDV|Dibona | +|-----------|-----------|------------|---------|------------------|-----------------|---------|-----------|-----------|------|-------------|-------------|----------------|------|-------|---|-------| +| | | | | | |24 cores |68 cores |48 cores |48 cores |68 cores |48 cores |68 cores |240 cores |68 cores |64 cores|64 cores | +|SPECFEM3D |Test Case A|24 |nodes |time |s |2389,00 |1639,00 |734,00 |658,00|1653,00 |744,00 |195,00 | |1963,50| |3921,22| +|SPECFEM3D |Test Case B|384 |nodes |time |s | |330,00 |169,00 |193,00|1211,00 |156,00 |50,00 | | | | | + diff --git a/specfem3d/SPECFEM3D_Build_README.txt b/specfem3d/SPECFEM3D_Build_README.txt deleted file mode 100644 index 1d1667470535304a3a768c381d60824febed21ca..0000000000000000000000000000000000000000 --- a/specfem3d/SPECFEM3D_Build_README.txt +++ /dev/null @@ -1,16 +0,0 @@ -on CURIE : -flags.guess : -DEF_FFLAGS="-O3 -DFORCE_VECTORIZATION -check nobounds -xHost -ftz --assume buffered_io -assume byterecl -align sequence -vec-report0 -std03 --diag-disable 6477 -implicitnone -warn truncated_source -warn -argument_checking -warn unused -warn declarations -warn alignments -warn -ignore_loc -warn usage -mcmodel=medium -shared-intel" - -configure command : -./configure MPIFC=mpif90 FC=ifort CC=icc CFLAGS="-mcmodel=medium --shared-intel" CPP=cpp - -in order to compile : -make clean -make all - diff --git a/specfem3d/compile.sh b/specfem3d/compile.sh index 17d6db5887b853b618f00ccef4f90ba7f13e6876..437ba48c47699b99db03069e10b0c04f0522a0c3 100755 --- a/specfem3d/compile.sh +++ b/specfem3d/compile.sh @@ -7,14 +7,23 @@ echo " - daint-gpu " echo " - daint-cpu-only " echo " - davide " echo " - juwels" +echo " - juwels-booster" +echo " - juwels_2018" echo " - irene-skl " echo " - irene-knl " echo " - dibona " echo " - frioul " echo " - deepsdv " echo " - hazelhen " +echo " - vega-gpu" +echo " - vega-cpu" +echo " - marconi100 " +echo " - supermuc-ng " + read machine +export NEX_XI=384 #384 #416 #352 #320 #288 #256 #224 #192 #160 #128 #96 #288 +export NPROC_XI=4 source ./env/env_${machine} Untar(){ @@ -26,14 +35,10 @@ Untar(){ if [[ $code != "0" ]]; then echo "Git clone failed, try a hard copy:" Copy - break 1 else cd specfem3d_globe/ - # Checkout of 31 ocotbre 2017 version - git checkout b1d6ba966496f269611eff8c2cf1f22bcdac2bd9 - - # Checkout v7.0.2, last version : unstable : tested on differents architectures and all simulations failed - #git checkout v7.0.2 + git checkout b1d6ba966496f269611eff8c2cf1f22bcdac2bd9 # Checkout of 31 ocotbre 2017 version + #git checkout v7.0.2 # Checkout v7.0.2, last version : unstable : tested on differents architectures and all simulations failed fi cd $ueabs_dir } @@ -69,6 +74,10 @@ Install(){ mkdir -p $install_dir mv $install_dir/../specfem3d_globe $install_dir/. cp test_cases/SPECFEM3D_TestCaseA/* $install_dir/specfem3d_globe/DATA/. + sed -i s/"NEX_XI = 384"/"NEX_XI = $NEX_XI"/g $install_dir/specfem3d_globe/DATA/Par_file + sed -i s/"NEX_ETA = 384"/"NEX_ETA = $NEX_XI"/g $install_dir/specfem3d_globe/DATA/Par_file + sed -i s/"NPROC_XI = 4"/"NPROC_XI = $NPROC_XI"/g $install_dir/specfem3d_globe/DATA/Par_file + sed -i s/"NPROC_ETA = 4"/"NPROC_ETA = $NPROC_XI"/g $install_dir/specfem3d_globe/DATA/Par_file elif [ $answer = "3" ]; then echo "===> Copy test case B" export install_dir=$install_dir/TestCaseB @@ -77,6 +86,8 @@ Install(){ cp test_cases/SPECFEM3D_TestCaseB/* $install_dir/specfem3d_globe/DATA/. fi cd $install_dir/specfem3d_globe + #Clean Configuration + rm Makefile bin/* if [ $machine = "daint-gpu" ] || [ $machine = "davide" ]; then echo "Configure for CPU+GPU system" ### replace `use mpi` if needed ### @@ -90,9 +101,20 @@ Install(){ export CUDA_INC="$CUDATOOLKIT_HOME/include" fi ./configure --build=ppc64 --with-cuda=cuda5 + elif [ $machine = "vega-gpu" ] || [ $machine = "juwels-booster" ]; then + echo "Configure for Vega GPU partion or Juwels-booster partition" + sed -i s/"GPU_MODE = .false."/"GPU_MODE = .true."/g $install_dir/specfem3d_globe/DATA/Par_file + sed -i s/"GPU_DEVICE = Tesla"/"GPU_DEVICE = *"/g $install_dir/specfem3d_globe/DATA/Par_file + ./configure --build=ppc64 --with-cuda=cuda8 + elif [ $machine = "marconi100" ]; then + echo "Configure for Marconi100 GPU partion" + sed -i s/"GPU_MODE = .false."/"GPU_MODE = .true."/g $install_dir/specfem3d_globe/DATA/Par_file + sed -i s/"GPU_DEVICE = Tesla"/"GPU_DEVICE = *"/g $install_dir/specfem3d_globe/DATA/Par_file + ./configure --build=ppc64 --with-cuda=cuda8 + sed -i s/"O4"/"O3"/g Makefile #-O4 is not supported with option -qoffload else echo "Configure for CPU only system" - ./configure --enable-openmp + ./configure --enable-openmp fi echo $machine #if [ $machine = "occigen" ] || [ $machine = "marenostrum" ] || [ $machine = "marconi-knl" ];then @@ -121,6 +143,8 @@ Clean(){ rm -rf $install_dir/TestCaseA elif [ $answer = "3" ]; then rm -rf $install_dir/TestCaseB + elif [ $answer = "1" ]; then + rm -rf $install_dir/specfem3d_globe else echo "Nothing has been deleted" fi @@ -132,15 +156,16 @@ Clean(){ Deploy(){ echo "install_dir ="$install_dir - if [ $machine = "occigen" ] || [ $machine = "marenostrum" ] || [ $machine = "marconi-knl" ] || [ $machine = "daint-cpu-only" ] || [ $machine = "daint-gpu" ] || [ $machine = "davide" ] || [ $machine = "juwels" ] || [ $machine = "irene-skl" ] || [ $machine = "irene-knl" ] || [ $machine = "dibona" ] || [ $machine = "frioul" ] || [ $machine = "deepsdv" ] || [ $machine = "hazelhen" ];then + if [ $machine = "occigen" ] || [ $machine = "marenostrum" ] || [ $machine = "marconi-knl" ] || [ $machine = "daint-cpu-only" ] || [ $machine = "daint-gpu" ] || [ $machine = "davide" ] || [ $machine = "juwels" ] || [ $machine = "juwels-booster" ] || [ $machine = "irene-skl" ] || [ $machine = "irene-knl" ] || [ $machine = "dibona" ] || [ $machine = "frioul" ] || [ $machine = "deepsdv" ] || [ $machine = "hazelhen" ] || [ $machine = "vega-cpu" ] || [ $machine = "vega-gpu" ] || [ $machine = "marconi100" ] || [ $machine = "supermuc-ng" ];then echo "==> Install on $machine :" mkdir -p $install_dir -# Clean + Clean export ueabs_dir=`pwd` Untar Install else - echo "Wrong machine !" + echo "machine : $machine " + echo "Wrong machine !" exit fi } diff --git a/specfem3d/env/env/env_supermuc-ng b/specfem3d/env/env/env_supermuc-ng new file mode 100644 index 0000000000000000000000000000000000000000..9692290879539fe339b868e39b591317cdbe9bc9 --- /dev/null +++ b/specfem3d/env/env/env_supermuc-ng @@ -0,0 +1,14 @@ +#!/bin/bash +module purge +module load admin/1.0 tempdir lrz/1.0 intel-oneapi/2021.2 +module li + +export machine=supermuc-ng +export software=specfem3d_globe +export version=31octobre +export install_dir=$WORK_pn73ve/benchmarks/$machine/$software/$version/ +export CC="mpiicc" +export FC="mpiifort" +export MPIFC=$FC +export FCFLAGS=" -O3 -qopenmp -xCORE-AVX512 -mtune=skylake -ipo -no-prec-div -no-prec-sqrt -fma -qopt-zmm-usage=high -DUSE_FP32 -DOPT_STREAMS -fp-model fast=2 -traceback -mcmodel=large" +export CFLAGS=" -O3 -qopenmp -xCORE-AVX512 -mtune=skylake -ipo -no-prec-div -no-prec-sqrt -fma -qopt-zmm-usage=high" diff --git a/specfem3d/env/env_juwels b/specfem3d/env/env_juwels index 7cf674b68f9c704c6d832f899319fa0d832c462c..637e9690d46d9f02bb49433c1e9c7c88401da05c 100644 --- a/specfem3d/env/env_juwels +++ b/specfem3d/env/env_juwels @@ -1,20 +1,19 @@ #!/bin/bash -module purge -module load Intel/2019.0.117-GCC-7.3.0 ParaStationMPI/5.2.1-1 #IntelMPI/2019.0.117 - +module --force purge +module use $OTHERSTAGES +module load Stages/Devel-2019a +module load Intel/2019.5.281-GCC-8.3.0 IntelMPI/2019.7.217 + export machine=juwels export software=specfem3d_globe export version=31octobre -#export install_dir=$HOME/benchmarks/$machine/$software/$version/ -export install_dir=$SCRATCH_cprpb66/benchmarks/$machine/$software/$version/ -export CC="mpicc" -export FC="mpifort" -#export CC="mpiicc" -#export FC="mpiifort" +export install_dir=$SCRATCH_prpb85/benchmarks/$machine/$software/$version/strong-scaling +export NEX_XI=384 #480 #448 #416 #352 #320 #288 #256 #224 #192 #160 #128 #96 #288 +#export install_dir=$SCRATCH_prpb85/benchmarks/$machine/$software/$version/$NEX_XI +export CC="mpiicc" +export FC="mpiifort" export MPIFC=$FC -#export FCFLAGS=" -g -O3 -qopenmp -xhost -DUSE_FP32 -DOPT_STREAMS -fp-model fast=2 -traceback -mcmodel=large" -#export CFLAGS=" -g -O3 -xhost " -export FCFLAGS=" -g -O3 -qopenmp -xCORE-AVX512 -mtune=skylake -ipo -DUSE_FP32 -DOPT_STREAMS -fp-model fast=2 -traceback -mcmodel=large" -export CFLAGS=" -g -O3 -xCORE-AVX512 -mtune=skylake -ipo" +export FCFLAGS=" -O3 -qopenmp -xCORE-AVX512 -mtune=skylake -ipo -no-prec-div -no-prec-sqrt -fma -qopt-zmm-usage=high -DUSE_FP32 -DOPT_STREAMS -fp-model fast=2 -traceback -mcmodel=large" +export CFLAGS=" -O3 -qopenmp -xCORE-AVX512 -mtune=skylake -ipo -no-prec-div -no-prec-sqrt -fma -qopt-zmm-usage=high" diff --git a/specfem3d/env/env_juwels-booster b/specfem3d/env/env_juwels-booster new file mode 100644 index 0000000000000000000000000000000000000000..bc467c0463d3001b9842facf0457813a6cc951df --- /dev/null +++ b/specfem3d/env/env_juwels-booster @@ -0,0 +1,18 @@ +#!/bin/bash + +#module --force purge +#module use $OTHERSTAGES +#module load Stages/Devel-2019a +#module load Intel/2019.5.281-GCC-8.3.0 IntelMPI/2019.7.217 +module load Intel/2021.2.0-GCC-10.3.0 ParaStationMPI/5.4.10-1 CUDA/11.3 +export machine=juwels-booster +export software=specfem3d_globe +export version=31octobre + +export NEX_XI=384 #128 #448 #416 #352 #320 #288 #256 #224 #192 #160 #128 #96 #288 +export install_dir=$SCRATCH_prpb85/benchmarks/$machine/$software/$version/$NEX_XI +export CC="mpicc" +export FC="mpifort" +export MPIFC=$FC +export FCFLAGS=" -O3 -qopenmp -march=core-avx2 -mtune=core-avx2 -ipo -no-prec-div -no-prec-sqrt -fma -DUSE_FP32 -DOPT_STREAMS -fp-model fast=2 -mcmodel=large" +export CFLAGS=" -O3 -qopenmp -march=core-avx2 -mtune=core-avx2 -ipo -no-prec-div -no-prec-sqrt -fma " diff --git a/specfem3d/env/env_marconi100 b/specfem3d/env/env_marconi100 new file mode 100644 index 0000000000000000000000000000000000000000..4322d8449063269a070af5923c598579c130b361 --- /dev/null +++ b/specfem3d/env/env_marconi100 @@ -0,0 +1,19 @@ +#!/bin/bash +module purge +module load profile/base +module load xl/16.1.1--binary spectrum_mpi/10.3.1--binary cuda/11.0 + +export machine=marconi100 +export software=specfem3d_globe +export version=31octobre +export install_dir=$CINECA_SCRATCH/benchmarks/$machine/$software/$version + +# Power9 +export CC="mpixlc" +export FC="mpixlf" +export MPIFC=mpixlf +#DEBUG="-qcheck -g -qsigtrap" # -qstackprotect=all -> internal compiler error : https://www.ibm.com/support/pages/node/722473 +#FLAGS_CHECK="-g -qfullpath -O2 -qsave -qstrict -qtune=qp -qarch=qp -qcache=auto -qhalt=w -qfree=f90 -qsuffix=f=f90 -qlanglvl=95pure -Q -Q+rank,swap_all -Wl,-relax" +#FLAGS_CHECK="-g -qfullpath -qsave -qtune=auto -qarch=auto -qcache=auto -qhalt=w -qfree=f90 -qsuffix=f=f90 -qlanglvl=2003pure" +export FCFLAGS="-O3 -DUSE_FP32 -DOPT_STREAMS -qpic $FLAGS_CHECK $DEBUG" #-mcmodel=large -qoffload +export CFLAGS="-O3 -qpic $FLAGS_CHECK $DEBUG" #-qoffload diff --git a/specfem3d/env/env_vega-cpu b/specfem3d/env/env_vega-cpu new file mode 100644 index 0000000000000000000000000000000000000000..9b31b68aeb2c3e539bf914ce62de8d69e7cf953b --- /dev/null +++ b/specfem3d/env/env_vega-cpu @@ -0,0 +1,20 @@ +#!/bin/bash +module purge +module load GCC/9.3.0 openmpi/gnu/4.0.5.2 +export machine=vega-cpu +export software=specfem3d_globe +export version=31octobre + +export NEX_XI=384 #384 #416 #352 #320 #288 #256 #224 #192 #160 #128 #96 #288 +export NPROC_XI=4 +MYSCRATCH=/exa5/scratch/user/eucedricj +export install_dir=$MYSCRATCH//benchmarks/$machine/$software/$version/cpu-znver2/ +echo "install_dir = " $install_dir + +export CC=mpicc #mpicc #mpiicc +export FC=mpif90 #mpif90 #mpiifort +export MPIFC=$FC + +# gnu +export FCFLAGS=" -O3 -fopenmp -march=znver2 -mtune=znver2 -flto -funroll-all-loops -ffast-math -mfma -mavx2 -m3dnow -fomit-frame-pointer -DUSE_FP32 -DOPT_STREAMS -mcmodel=large" +export CFLAGS=" -O3 -fopenmp -march=znver2 -mtune=znver2 -flto -funroll-all-loops -ffast-math -mfma -mavx2 -m3dnow -fomit-frame-pointer" diff --git a/specfem3d/env/env_vega-gpu b/specfem3d/env/env_vega-gpu new file mode 100644 index 0000000000000000000000000000000000000000..166db4b196294d85ad432f160f5e7777dc8496c1 --- /dev/null +++ b/specfem3d/env/env_vega-gpu @@ -0,0 +1,24 @@ +#!/bin/bash +module purge +module load GCC/9.3.0 openmpi/gnu/4.0.5.2 CUDA/11.0.2-GCC-9.3.0 +export machine=vega-gpu +export software=specfem3d_globe +export version=31octobre + + +export NEX_XI=384 #384 #416 #352 #320 #288 #256 #224 #192 #160 #128 #96 #288 +export NPROC_XI=4 +MYSCRATCH=/exa5/scratch/user/eucedricj +export install_dir=$MYSCRATCH//benchmarks/$machine/$software/$version +echo "install_dir = " $install_dir + +export CC=mpicc #mpiicc +export FC=mpif90 #mpiifort +export MPIFC=$FC + +# gnu +export FCFLAGS=" -O3 -flto -march=znver2 -mtune=znver2 -ffast-math -mfma -mavx2 -m3dnow -fomit-frame-pointer -DUSE_FP32 -DOPT_STREAMS -mcmodel=large" #-fopenmp -march=znver1 -funroll-all-loops +export CFLAGS=" -O3 -flto -march=znver2 -mtune=znver2 -ffast-math -mfma -mavx2 -m3dnow -fomit-frame-pointer" +export CUDA_LIB=$CUDA_HOME/lib64 +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/ +export LDFLAGS="$LDFLAGS -lpthread" diff --git a/specfem3d/job_script/job_daint-gpu_test_case_C.slurm b/specfem3d/job_script/job_daint-gpu_test_case_C.slurm deleted file mode 100644 index 7353fefe3e5dce55cabfcd9fb32b72d6a9b43082..0000000000000000000000000000000000000000 --- a/specfem3d/job_script/job_daint-gpu_test_case_C.slurm +++ /dev/null @@ -1,61 +0,0 @@ -#!/bin/bash -l -#SBATCH --job-name=specfem3D_test_case_C -#SBATCH --time=01:00:00 -#SBATCH --nodes=2 -#SBATCH --ntasks-per-node=3 -#SBATCH --cpus-per-task=4 -#SBATCH --partition=normal -#SBATCH --constraint=gpu -#SBATCH --output=specfem3D_test_case_A_daint-gpu-%j.output - -set -e - -source ../env/env_daint-gpu -cd $install_dir/TestCaseC/specfem3d_globe - -export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK -export CRAY_CUDA_MPS=1 -ulimit -s unlimited - -MESHER_EXE=./bin/xmeshfem3D -SOLVER_EXE=./bin/xspecfem3D - -# backup files used for this simulation -cp DATA/Par_file OUTPUT_FILES/ -cp DATA/STATIONS OUTPUT_FILES/ -cp DATA/CMTSOLUTION OUTPUT_FILES/ - -## -## mesh generation -## -sleep 2 - -echo -echo `date` -echo "starting MPI mesher" -echo - -MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` -echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE -echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK -echo "SLURM_NNODES=" $SLURM_NNODES -echo "MPI_PROCESS $MPI_PROCESS" - -time srun -n ${MPI_PROCESS} ${MESHER_EXE} -echo " mesher done: `date`" -echo - -## -## forward simulation -## -sleep 2 - -echo -echo `date` -echo starting run in current directory $PWD -echo -#unset FORT_BUFFERED -time srun -n ${MPI_PROCESS} ${SOLVER_EXE} - -echo "finished successfully" -echo `date` diff --git a/specfem3d/job_script/job_davide_test_case_C.slurm b/specfem3d/job_script/job_davide_test_case_C.slurm deleted file mode 100644 index 625ed7792b1423b5e2902b5a94bfd0712a898115..0000000000000000000000000000000000000000 --- a/specfem3d/job_script/job_davide_test_case_C.slurm +++ /dev/null @@ -1,67 +0,0 @@ -#!/bin/bash -#SBATCH -J Test_case_A -#SBATCH --time=01:30:00 -#SBATCH --nodes=1 -#SBATCH --ntasks-per-node=6 -##SBATCH --ntasks-per-core=1 -#SBATCH --cpus-per-task=2 -#SBATCH --partition=dvd_usr_prod -##SBATCH --qos=noQOS -#SBATCH --mem=86000 -#SBATCH --out=Test_case_C_davide-%j.out -#SBATCH --err=Test_case_C_davide-%j.err -#SBATCH --account=Dec00_5IPwp7 -#SBATCH --gres=gpu:4 # (N=1,4) - -set -e -source ../env/env_davide - -export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK -export OMP_PLACES=threads -export OMP_PROC_BIND=true - -cd $install_dir/TestCaseC/specfem3d_globe - -#ulimit -s unlimited - -MESHER_EXE=./bin/xmeshfem3D -SOLVER_EXE=./bin/xspecfem3D - -# backup files used for this simulation -cp DATA/Par_file OUTPUT_FILES/ -cp DATA/STATIONS OUTPUT_FILES/ -cp DATA/CMTSOLUTION OUTPUT_FILES/ - -## -## mesh generation -## -sleep 2 - -echo -echo `date` -echo "starting MPI mesher" -echo - -MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` -echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE -echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK -echo "SLURM_NNODES=" $SLURM_NNODES -echo "MPI_PROCESS $MPI_PROCESS" - -time srun -n ${MPI_PROCESS} ${MESHER_EXE} -echo " mesher done: `date`" -echo - -## -## forward simulation -## -sleep 2 - -echo -echo `date` -echo starting run in current directory $PWD -echo -time srun -n ${MPI_PROCESS} ${SOLVER_EXE} - -echo "finished successfully" -echo `date` diff --git a/specfem3d/job_script/job_deepsdv_test_case_C.slurm b/specfem3d/job_script/job_deepsdv_test_case_C.slurm deleted file mode 100644 index ce7fc74903870ff48760ad7c258881fdd1efa773..0000000000000000000000000000000000000000 --- a/specfem3d/job_script/job_deepsdv_test_case_C.slurm +++ /dev/null @@ -1,60 +0,0 @@ -#!/bin/bash -#SBATCH -J Test_case_C -#SBATCH --time=01:30:00 -#SBATCH --nodes=1 -#SBATCH --ntasks-per-node=6 -#SBATCH --cpus-per-task=4 -#SBATCH --partition=sdv -#SBATCH --out=Test_case_C_deepsdv-%j.out -#SBATCH --err=Test_case_C_deepsdv-%j.err - -set -e -source ../env/env_deepsdv - -export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK - -cd $install_dir/TestCaseC/specfem3d_globe - -#ulimit -s unlimited - -MESHER_EXE=./bin/xmeshfem3D -SOLVER_EXE=./bin/xspecfem3D - -# backup files used for this simulation -cp DATA/Par_file OUTPUT_FILES/ -cp DATA/STATIONS OUTPUT_FILES/ -cp DATA/CMTSOLUTION OUTPUT_FILES/ - -## -## mesh generation -## -sleep 2 - -echo -echo `date` -echo "starting MPI mesher" -echo - -MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` -echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE -echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK -echo "SLURM_NNODES=" $SLURM_NNODES -echo "MPI_PROCESS $MPI_PROCESS" - -time srun -n ${MPI_PROCESS} ${MESHER_EXE} -echo " mesher done: `date`" -echo - -## -## forward simulation -## -sleep 2 - -echo -echo `date` -echo starting run in current directory $PWD -echo -time srun -n ${MPI_PROCESS} ${SOLVER_EXE} - -echo "finished successfully" -echo `date` diff --git a/specfem3d/job_script/job_juwels-booster_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_juwels-booster_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..915ba1b75512032be480f44270833b023ce3a833 --- /dev/null +++ b/specfem3d/job_script/job_juwels-booster_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,68 @@ +#!/bin/bash +#SBATCH -J specfem_gpu_small_benchmark_run_to_test_more_complex_Earth +#SBATCH --account=prpb85 +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=24 +#SBATCH --cpus-per-task=2 +#SBATCH --time=01:59:59 +#SBATCH --output specfem_gpu_small_benchmark_run_to_test_more_complex_Earth-%j.out +#SBATCH --exclusive +#SBATCH -p booster +#SBATCH --gres=gpu:4 + +#set -e +source ../env/env_juwels-booster +grep "^[^#;]" ../env/env_juwels-booster +cat job_juwels-booster_small_benchmark_run_to_test_more_complex_Earth.slurm + +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth +export OMPI_MCA_pml=ucx +export OMPI_MCA_btl="^uct,tcp,openib,vader" +export CUDA_VISIBLE_DEVICES=0,1,2,3 +#export OMP_NUM_THREADS=1 + +sed -i s/"GPU_MODE = .false."/"GPU_MODE = .true."/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"GPU_DEVICE = Tesla"/"GPU_DEVICE = *"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"configure --enable-openmp"/"configure --build=ppc64 --with-cuda=cuda8 "/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_this_example.sh +sed -i s/"mpirun -np"/"srun -n"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_mesher_solver.bash +#taskset -a -p $PPID + +time ./run_this_example.sh + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/specfem3d_globe/make.log + +echo +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +# uncompress seismograms +if [ -e EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/II.AAK.MXE.sem.ascii.bz2 ]; then + echo + echo "unzipping references..." + echo + mkdir OUTPUT_FILES_reference_OK/ + bunzip2 EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/*.bz2 + echo + echo +fi + +#wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh +#sh /ceph/hpc/home/eucedricj/Miniconda3-py37_4.10.3-Linux-x86_64.sh +#source miniconda3/bin/activate +#conda create --name python2 python=2.7 +# compares seismograms by plotting correlations +./utils/compare_seismogram_correlations.py EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/ EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/ + +echo +echo "done" +ls -lrth $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_*.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_mesher.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels-booster_test_case_A.slurm b/specfem3d/job_script/job_juwels-booster_test_case_A.slurm new file mode 100644 index 0000000000000000000000000000000000000000..a202c0356481a031d5f3eaeb2d4afaf7144bf716 --- /dev/null +++ b/specfem3d/job_script/job_juwels-booster_test_case_A.slurm @@ -0,0 +1,79 @@ +#!/bin/bash -x +#SBATCH -J Test_case_A-gpu +#SBATCH --account=prpb85 +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=12 +#SBATCH --time=00:29:59 +#SBATCH --partition=booster +#SBATCH --output=specfem_%x_juwels-booster-%j.output +#SBATCH --gres=gpu:4 +##SBATCH --acctg-freq=task=1 +set -e + +source ../env/env_juwels-booster +grep "^[^#;]" ../env/env_juwels-booster +cat job_juwels-booster_test_case_A.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log +cd $install_dir/TestCaseA/specfem3d_globe +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +export KMP_HW_SUBSET=1T +export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" +echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" + +time srun -n ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time srun -n ${MPI_PROCESS} ${SOLVER_EXE} +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels-booster_test_case_A_strong.slurm b/specfem3d/job_script/job_juwels-booster_test_case_A_strong.slurm new file mode 100644 index 0000000000000000000000000000000000000000..300c036f848ceafd6cc2490c3cb8005627141280 --- /dev/null +++ b/specfem3d/job_script/job_juwels-booster_test_case_A_strong.slurm @@ -0,0 +1,79 @@ +#!/bin/bash -x +#SBATCH -J Test_case_A-gpu +#SBATCH --account=prpb85 +#SBATCH --nodes=48 +#SBATCH --ntasks-per-node=2 +#SBATCH --cpus-per-task=32 +#SBATCH --time=00:29:59 +#SBATCH --partition=booster +#SBATCH --output=specfem_%x_juwels-booster-strong-48Nodes-%j.output +#SBATCH --gres=gpu:4 +##SBATCH --acctg-freq=task=1 +set -e + +source ../env/env_juwels-booster +grep "^[^#;]" ../env/env_juwels-booster +cat job_juwels-booster_test_case_A_strong.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log +cd $install_dir/TestCaseA/specfem3d_globe +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +export KMP_HW_SUBSET=1T +#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" +echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" + +time srun -n ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time srun -n ${MPI_PROCESS} ${SOLVER_EXE} +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels-booster_test_case_B.slurm b/specfem3d/job_script/job_juwels-booster_test_case_B.slurm new file mode 100644 index 0000000000000000000000000000000000000000..c72e7870bfd4361fc113685afdd7733a0a3b2e55 --- /dev/null +++ b/specfem3d/job_script/job_juwels-booster_test_case_B.slurm @@ -0,0 +1,79 @@ +#!/bin/bash -x +#SBATCH -J Test_case_B-gpu +#SBATCH --account=prpb85 +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=12 +#SBATCH --time=00:29:59 +#SBATCH --partition=booster +#SBATCH --output=specfem_%x_juwels-booster-%j.output +#SBATCH --gres=gpu:4 +##SBATCH --acctg-freq=task=1 +set -e + +source ../env/env_juwels-booster +grep "^[^#;]" ../env/env_juwels-booster +cat job_juwels-booster_test_case_B.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseB/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseB/specfem3d_globe/make.log +cd $install_dir/TestCaseB/specfem3d_globe +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +export KMP_HW_SUBSET=1T +export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" +echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" + +time srun -n ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time srun -n ${MPI_PROCESS} ${SOLVER_EXE} +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseB/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseB/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseB/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_juwels_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..f6699430bb64f77f4a94c3c1f836937f668ed9eb --- /dev/null +++ b/specfem3d/job_script/job_juwels_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,54 @@ +#!/bin/bash +#SBATCH -J Validation_case_specfem-small_benchmark_run_to_test_more_complex_Earth +#SBATCH --account=prpb85 +#SBATCH --partition=batch +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=24 +#SBATCH --time=02:59:59 +#SBATCH --output Validation_case_specfem_small_benchmark_run_to_test_more_complex_Earth-%j.out +##SBATCH --acctg-freq=task=1 + +#set -e +source ../env/env_juwels +echo "Environment used:" +echo "=================" +cat ../env/env_juwels + +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth + + +export SLURM_CPU_BIND=NONE +export I_MPI_PIN=1 +#export I_MPI_PIN_PROCESSOR_LIST=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 + +alias mpirun='srun' +time ./run_this_example.sh + +echo +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +# uncompress seismograms +if [ -e EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/II.AAK.MXE.sem.ascii.bz2 ]; then + echo + echo "unzipping references..." + echo + mkdir OUTPUT_FILES_reference_OK/ + bunzip2 EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/*.bz2 + echo + echo +fi + +module purge +export PATH=$PATH:/p/software/juwels/stages/2018a/software/Python/2.7.14-GCCcore-7.3.0/bin +pip install --user numpy +#module load intel/17.0 python/2.7.13 +# compares seismograms by plotting correlations +./utils/compare_seismogram_correlations.py EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/ EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/ + +echo +echo "done" +echo + + diff --git a/specfem3d/job_script/job_juwels_small_benchmark_run_to_test_more_complex_Earth_strong.slurm b/specfem3d/job_script/job_juwels_small_benchmark_run_to_test_more_complex_Earth_strong.slurm new file mode 100644 index 0000000000000000000000000000000000000000..faa33d5dd89b48083b9341e8710fee1cc3f13786 --- /dev/null +++ b/specfem3d/job_script/job_juwels_small_benchmark_run_to_test_more_complex_Earth_strong.slurm @@ -0,0 +1,38 @@ +#!/bin/bash +#SBATCH -J specfem_small_benchmark_run_to_test_more_complex_Earth +#SBATCH --account=prpb85 +#SBATCH --nodes=4 +#SBATCH --ntasks-per-node=24 +#SBATCH --cpus-per-task=2 +#SBATCH --time=01:29:59 +#SBATCH --output specfem_small_benchmark_run_to_test_more_complex_Earth-4Nodes-%j.out +#SBATCH --exclusive +#SBATCH -p batch + +#set -e +source ../env/env_juwels +grep "^[^#;]" ../env/env_juwels +cat job_juwels_small_benchmark_run_to_test_more_complex_Earth_strong.slurm + +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth +export SLURM_CPU_BIND=NONE +export I_MPI_PIN=1 +#export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK +alias mpirun='srun' + +time ./run_this_example.sh + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/specfem3d_globe/make.log + +echo "done" +ls -lrth $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_*.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_mesher.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_solver.txt +sleep 2 diff --git a/specfem3d/job_script/job_juwels_test_case_A.slurm b/specfem3d/job_script/job_juwels_test_case_A.slurm index 84f5dc2b7b397810726eb4f58fa364bea14f3a3d..1d38873da3bc59ce724de47abcedb95be91e5ac2 100644 --- a/specfem3d/job_script/job_juwels_test_case_A.slurm +++ b/specfem3d/job_script/job_juwels_test_case_A.slurm @@ -1,26 +1,35 @@ #!/bin/bash -x #SBATCH -J Test_case_A -#SBATCH --account=prpb66 +#SBATCH --account=prpb85 #SBATCH --nodes=24 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=12 -#SBATCH --time=00:30:00 +#SBATCH --time=01:59:59 #SBATCH --partition=batch -#SBATCH --output=specfem_small_juwels_96MPI_12OMP_srun_AVX512_mtune_skl_ParaStationMPI-%j.output -#SBATCH --acctg-freq=task=1 +#SBATCH --output=specfem_%x_juwels-%j.output +##SBATCH --acctg-freq=task=1 set -e source ../env/env_juwels -cd $install_dir/TestCaseA/specfem3d_globe +grep "^[^#;]" ../env/env_juwels +cat job_juwels_test_case_A.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log -export I_MPI_DOMAIN=auto -export I_MPI_PIN_RESPECT_CPUSET=0 -export I_MPI_DEBUG=4 +cd $install_dir/TestCaseA/specfem3d_globe +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 #Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task export KMP_HW_SUBSET=1T -export OMP_NUM_THREADS=12 -#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true ulimit -s unlimited @@ -49,7 +58,6 @@ echo "SLURM_NNODES=" $SLURM_NNODES echo "MPI_PROCESS $MPI_PROCESS" echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" -#time mpirun -n ${MPI_PROCESS} ${MESHER_EXE} time srun -n ${MPI_PROCESS} ${MESHER_EXE} echo " mesher done: `date`" echo @@ -63,9 +71,9 @@ echo echo `date` echo starting run in current directory $PWD echo -#time mpirun -n ${MPI_PROCESS} ${SOLVER_EXE} time srun -n ${MPI_PROCESS} ${SOLVER_EXE} - -echo "finished successfully" +echo "=====================" echo `date` - +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels_test_case_A_NEX_XI.slurm b/specfem3d/job_script/job_juwels_test_case_A_NEX_XI.slurm new file mode 100644 index 0000000000000000000000000000000000000000..f35bb364f14182fbe5475b0d164aaaa28ff9747c --- /dev/null +++ b/specfem3d/job_script/job_juwels_test_case_A_NEX_XI.slurm @@ -0,0 +1,81 @@ +#!/bin/bash -x +#SBATCH -J Test_case_A +#SBATCH --account=prpb85 +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=12 +#SBATCH --time=00:59:59 +#SBATCH --partition=batch +#SBATCH --output=specfem_%x_NEX_XI-%j.output +##SBATCH --acctg-freq=task=1 +set -e + +source ../env/env_juwels +grep "^[^#;]" ../env/env_juwels +cat job_juwels_test_case_A_NEX_XI.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log +cd $install_dir/TestCaseA/specfem3d_globe +#export I_MPI_DEBUG=5 +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +export KMP_HW_SUBSET=1T +export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" +echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" + +time srun -n ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time srun -n ${MPI_PROCESS} ${SOLVER_EXE} +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt +cd - +mv specfem_${SLURM_JOB_NAME}_NEX_XI-$SLURM_JOBID.output specfem_${SLURM_JOB_NAME}_NEX_XI-$NEX_XI-$SLURM_JOBID.output diff --git a/specfem3d/job_script/job_juwels_test_case_A_strong.slurm b/specfem3d/job_script/job_juwels_test_case_A_strong.slurm new file mode 100644 index 0000000000000000000000000000000000000000..d89f22570db37adb64da3fcb1371060127348f17 --- /dev/null +++ b/specfem3d/job_script/job_juwels_test_case_A_strong.slurm @@ -0,0 +1,79 @@ +#!/bin/bash -x +#SBATCH -J Test_case_A_strong +#SBATCH --account=prpb85 +#SBATCH --nodes=48 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=24 +#SBATCH --time=00:59:59 +#SBATCH --partition=batch +#SBATCH --output=specfem_%x_juwels-%j.output +##SBATCH --acctg-freq=task=1 +set -e + +source ../env/env_juwels +grep "^[^#;]" ../env/env_juwels +cat job_juwels_test_case_A.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log + +cd $install_dir/TestCaseA/specfem3d_globe +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +export KMP_HW_SUBSET=1T +export OMP_NUM_THREADS=24 #${SLURM_CPUS_PER_TASK} +export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" +echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" + +time srun -n ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time srun -n ${MPI_PROCESS} ${SOLVER_EXE} +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels_test_case_B.slurm b/specfem3d/job_script/job_juwels_test_case_B.slurm index a9ffb24150c86100bfd3a46f08f3c9cd9ff7d203..97829d4ae093c0bac4f74023b96826c90f313e20 100644 --- a/specfem3d/job_script/job_juwels_test_case_B.slurm +++ b/specfem3d/job_script/job_juwels_test_case_B.slurm @@ -1,27 +1,26 @@ #!/bin/bash -x #SBATCH -J Test_case_B -#SBATCH --account=prpb66 +#SBATCH --account=prpb85 #SBATCH --nodes=384 #SBATCH --ntasks-per-node=4 -#SBATCH --cpus-per-task=6 +#SBATCH --cpus-per-task=12 #SBATCH --time=00:30:00 #SBATCH --partition=batch -#SBATCH --output=specfem_TestCaseB_juwels_12OMP-HT-ParaStationMPI-%j.output +#SBATCH --output=specfem_%x_juwels-HT-%j.output set -e - +cat job_juwels_test_case_B.slurm source ../env/env_juwels cd $install_dir/TestCaseB/specfem3d_globe - -#export I_MPI_DOMAIN=auto -#export I_MPI_PIN_RESPECT_CPUSET=0 #export I_MPI_DEBUG=4 +export I_MPI_PMI_VALUE_LENGTH_MAX=1800 #Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task export KMP_HW_SUBSET=2T -export OMP_NUM_THREADS=12 -#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export OMP_NUM_THREADS=24 +export KMP_AFFINITY=granularity=thread,compact +export FORT_BUFFERED=true -#ulimit -s unlimited +ulimit -s unlimited MESHER_EXE=./bin/xmeshfem3D SOLVER_EXE=./bin/xspecfem3D @@ -48,7 +47,6 @@ echo "SLURM_NNODES=" $SLURM_NNODES echo "MPI_PROCESS $MPI_PROCESS" echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" -#time mpirun -n ${MPI_PROCESS} ${MESHER_EXE} time srun -n ${MPI_PROCESS} ${MESHER_EXE} echo " mesher done: `date`" echo @@ -62,10 +60,7 @@ echo echo `date` echo starting run in current directory $PWD echo -#unset FORT_BUFFERED -#time mpirun -n ${MPI_PROCESS} ${SOLVER_EXE} time srun -n ${MPI_PROCESS} ${SOLVER_EXE} echo "finished successfully" echo `date` - diff --git a/specfem3d/job_script/job_marconi-knl_test_case_B.slurm b/specfem3d/job_script/job_marconi-knl_test_case_B.slurm index 9cf2a52d4402966bf9644b6525693255d5854ea5..0e008e743ba5f73b522067302f1b3c8e60764ff0 100644 --- a/specfem3d/job_script/job_marconi-knl_test_case_B.slurm +++ b/specfem3d/job_script/job_marconi-knl_test_case_B.slurm @@ -7,9 +7,7 @@ #SBATCH --constraint=knl,cache,quad #flat/cache #SBATCH --nodes=384 #SBATCH --ntasks-per-node=4 -##SBATCH --ntasks=1584 #SBATCH --cpus-per-task=17 -##SBATCH --mem=86000 #SBATCH --output=specfem3D_test_case_B_marconi-%j.output source ../env/env_marconi-knl cd $install_dir/TestCaseB/specfem3d_globe diff --git a/specfem3d/job_script/job_marconi100_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_marconi100_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..87f600f121cd49d62ddb817c1f6dc0b2e6b884b3 --- /dev/null +++ b/specfem3d/job_script/job_marconi100_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,75 @@ +#!/bin/bash +#SBATCH -J Validation_case_specfem-Marconi100_small_benchmark_run_to_test_more_complex_Earth +#SBATCH -A Ppp4x_5850 +#SBATCH -p m100_usr_prod +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=24 # 24 tasks out of 128 +#SBATCH --cpus-per-task=4 +#SBATCH --time=01:59:59 +#SBATCH --output Validation_case_specfem-Marconi100_small_benchmark_run_to_test_more_complex_Earth-xl-spectrumpi-mpirun-GPU_DEVICE-star-%j.out +#SBATCH --gres=gpu:4 # 1 gpus per node out of 4 +##SBATCH --hint=nomultithread +#SBATCH --exclusive +#SBATCH --mem=246000 # memory per node out of 246000MB + +#set -e +source ../env/env_marconi100 +echo "Environment used:" +echo "=================" +grep -E -v '^(#|$)' ../env/env_marconi100 +cat job_marconi100_small_benchmark_run_to_test_more_complex_Earth.slurm +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth +export CUDA_VISIBLE_DEVICES=0,1,2,3 + +# Uncomment the 3-5 following lines if it's first time launched after the compilation: +sed -i s/"GPU_MODE = .false."/"GPU_MODE = .true."/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"GPU_DEVICE = Tesla"/"GPU_DEVICE = *"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"configure --enable-openmp"/"configure --build=ppc64 --with-cuda=cuda8"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_this_example.sh +#sed -i s/"mpirun -np"/"srun -n"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_mesher_solver.bash +sed -i s/"mpirun -np"/"mpirun -gpu -np"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_mesher_solver.bash +sed -i '38d' $install_dir/specfem3d_globe/Makefile +grep configure $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_this_example.sh + +#export OMP_NUM_THREADS=1 # $SLURM_CPUS_PER_TASK +ulimit -s unlimited +sed -i '40 i sed -i s/"O4"/"O3"/g Makefile' $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_this_example.sh +time ./run_this_example.sh +grep GPU $install_dir/specfem3d_globe/DATA/Par_file + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/specfem3d_globe/make.log + +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +# uncompress seismograms +if [ -e EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/II.AAK.MXE.sem.ascii.bz2 ]; then + echo + echo "unzipping references..." + echo + mkdir OUTPUT_FILES_reference_OK/ + bunzip2 EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/*.bz2 + echo + echo +fi + +# compares seismograms by plotting correlations +# Python2 (2.7 -virtual env - numpy) +source /m100/home/userexternal/cjourdai/numpy-test/bin/activate +which python +./utils/compare_seismogram_correlations.py EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/ EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/ + +echo +echo "done" +echo +echo "========" +cat $install_dir//specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_solver.txt +echo "========" +cat $install_dir//specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_mesher.txt diff --git a/specfem3d/job_script/job_marconi100_test_case_A.slurmc b/specfem3d/job_script/job_marconi100_test_case_A.slurmc new file mode 100644 index 0000000000000000000000000000000000000000..4390ceb564efc8e38df5de8cb74e470d341582f7 --- /dev/null +++ b/specfem3d/job_script/job_marconi100_test_case_A.slurmc @@ -0,0 +1,81 @@ +#!/bin/bash +#SBATCH -J Test_case_A +#SBATCH -A Ppp4x_5850 +#SBATCH -p m100_usr_prod +#SBATCH --time 01:59:00 +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=8 +#SBATCH --output=specfem3D_%x_marconi100-xl-spectrumpi-GPU-%j.output +#SBATCH --gres=gpu:4 +#SBATCH --gpus-per-node=4 +#SBATCH --hint=nomultithread +#SBATCH --exclusive +#SBATCH --qos=m100_qos_bprod + +source ../env/env_marconi100 +echo "Environment used:" +echo "=================" +grep -E -v '^(#|$)' ../env/env_marconi100 +cat job_marconi100_test_case_A.slurm +cd $install_dir/TestCaseA/specfem3d_globe +grep GPU DATA/Par_file +export CUDA_VISIBLE_DEVICES=0,1,2,3 + +ulimit -s unlimited +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=$SLURM_NTASKS +echo "SLURM_NTASKS= " $SLURM_NTASKS +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" + +time mpirun -gpu -np ${MPI_PROCESS} ${MESHER_EXE} +# set the value of --ntasks-per-node to the number of MPI processes you want to run per node, and --cpus-per-task = OMP_NUM_THREADS (if you want to exploit the SMT in terms of number of OMP threads) or to 128 / (ntasks-per-node) (if you want to exploit the SMT in terms of number of MPI processes). +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time mpirun -gpu -np ${MPI_PROCESS} ${SOLVER_EXE} + +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt + +echo "finished successfully" +echo `date` diff --git a/specfem3d/job_script/job_marconi100_test_case_B.slurm b/specfem3d/job_script/job_marconi100_test_case_B.slurm new file mode 100644 index 0000000000000000000000000000000000000000..f27d168c8965580cda6898bca98babf4d6bea1a1 --- /dev/null +++ b/specfem3d/job_script/job_marconi100_test_case_B.slurm @@ -0,0 +1,81 @@ +#!/bin/bash +#SBATCH -J Test_case_B +#SBATCH -A Ppp4x_5850 +#SBATCH -p m100_usr_prod +#SBATCH --time 01:59:00 +#SBATCH --nodes=384 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=8 +#SBATCH --output=specfem3D_%x_marconi100-xl-spectrumpi-GPU-%j.output +#SBATCH --gres=gpu:4 +#SBATCH --gpus-per-node=4 +#SBATCH --hint=nomultithread +#SBATCH --exclusive + +source ../env/env_marconi100 +echo "Environment used:" +echo "=================" +grep -E -v '^(#|$)' ../env/env_marconi100 +cat job_marconi100_test_case_B.slurm +cd $install_dir/TestCaseB/specfem3d_globe +grep GPU DATA/Par_file +export CUDA_VISIBLE_DEVICES=0,1,2,3 + +ulimit -s unlimited +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseB/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseB/specfem3d_globe/make.log + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS= " $SLURM_NTASKS +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" + +time mpirun -gpu -np ${MPI_PROCESS} ${MESHER_EXE} +# set the value of --ntasks-per-node to the number of MPI processes you want to run per node, and --cpus-per-task = OMP_NUM_THREADS (if you want to exploit the SMT in terms of number of OMP threads) or to 128 / (ntasks-per-node) (if you want to exploit the SMT in terms of number of MPI processes). +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time mpirun -gpu -np ${MPI_PROCESS} ${SOLVER_EXE} + +ls -lrth $install_dir/TestCaseB/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseB/specfem3d_globe/OUTPUT_FILES/output_solver.txt +echo "========" +cat $install_dir/TestCaseB/specfem3d_globe/OUTPUT_FILES/output_mesher.txt + +echo "finished successfully" +echo `date` diff --git a/specfem3d/job_script/job_strong_scaling_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_strong_scaling_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..af3009e63b8c4145f4cfa80cb4b7a6cc3a1ce620 --- /dev/null +++ b/specfem3d/job_script/job_strong_scaling_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,49 @@ +#!/bin/bash +#SBATCH -J specfem_strong_scaling_gpu_small_benchmark_run_to_test_more_complex_Earth +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=128 +#SBATCH --time=00:59:59 +#SBATCH --output specfem_gpu_small_benchmark_run_to_test_more_complex_Earth-24Nodes-%j.out +#SBATCH --exclusive +#SBATCH -p gpu +#SBATCH --gres=gpu:4 +#set -e +source ../env/env_vega-gpu +grep "^[^#;]" ../env/env_vega-gpu + +cat job_strong_scaling_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm + +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth +export OMPI_MCA_pml=ucx +export OMPI_MCA_btl="^uct,tcp,openib,vader" +export CUDA_VISIBLE_DEVICES=0,1,2,3 +#export OMP_NUM_THREADS=1 + +sed -i s/"GPU_MODE = .false."/"GPU_MODE = .true."/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"GPU_DEVICE = Tesla"/"GPU_DEVICE = *"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"configure --enable-openmp"/"configure --build=ppc64 --with-cuda=cuda8 "/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_this_example.sh +#sed -i s/"mpirun -np"/"srun -n"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_mesher_solver.bash +#taskset -a -p $PPID + +time ./run_this_example.sh + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/specfem3d_globe/make.log + +echo +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +echo "=================================" +echo "done" +ls -lrth $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_*.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_mesher.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_supermuc-ng_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_supermuc-ng_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..2ed94d40fbf82afc08b4131000146a2b34ab04b5 --- /dev/null +++ b/specfem3d/job_script/job_supermuc-ng_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,51 @@ +#!/bin/bash +#SBATCH -J Validation_case_specfem-small_benchmark_run_to_test_more_complex_Earth +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=24 +#SBATCH --time=01:59:59 +#SBATCH --no-requeue +#SBATCH --account=pn68go +#SBATCH --partition=micro # insert test, micro, general, large or fat +#SBATCH --output Validation_case_specfem_small_benchmark_run_to_test_more_complex_Earth-%j.out + +source ../env/env_supermuc-ng +echo "Environment used:" +echo "=================" +cat ../env/env_supermuc-ng +cat job_supermuc-ng_small_benchmark_run_to_test_more_complex_Earth.slurm + +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth + +export SLURM_CPU_BIND=NONE +export I_MPI_PIN=1 +export LIBRARY_PATH=$LD_LIBRARY_PATH +echo "LD_LIBRARY_PATH = $LD_LIBRARY_PATH" +export CPATH=$CPATH:/usr/local/include/:/usr/include +export FPATH=$FPATH:/usr/local/include/:/usr/include + +echo "===============================================================" + +time ./run_mesher_solver.bash +echo +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +# uncompress seismograms +if [ -e EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/II.AAK.MXE.sem.ascii.bz2 ]; then + echo + echo "unzipping references..." + echo + mkdir OUTPUT_FILES_reference_OK/ + bunzip2 EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/*.bz2 + echo + echo +fi + +module load python/2.7_intel +# compares seismograms by plotting correlations +./utils/compare_seismogram_correlations.py EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/ EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/ + +echo +echo "done" +echo diff --git a/specfem3d/job_script/job_supermuc-ng_test_case_A.slurm b/specfem3d/job_script/job_supermuc-ng_test_case_A.slurm new file mode 100644 index 0000000000000000000000000000000000000000..85f5a5141a6ac0afa3c313b8719e129361f97884 --- /dev/null +++ b/specfem3d/job_script/job_supermuc-ng_test_case_A.slurm @@ -0,0 +1,89 @@ +#!/bin/bash +#SBATCH -J Test_case_A +#SBATCH --nodes=12 +#SBATCH --ntasks-per-node=8 +#SBATCH --cpus-per-task=6 +#SBATCH --time=01:00:00 +#SBATCH --no-requeue +#SBATCH --account=pn73ve +#SBATCH --partition=micro #general # insert test, micro, general, large or fat +#SBATCH -o ./%x-12Nodes-6OMP-%j.out +#set -e +source ../env/env_supermuc-ng +cat ../env/env_supermuc-ng +grep "^[^#;]" ../env/env_supermuc-ng +cat job_supermuc-ng_test_case_A.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log +echo "========" +echo "Par_file" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/DATA/Par_file + +module load slurm_setup + +cd $install_dir/TestCaseA/specfem3d_globe + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +export KMP_HW_SUBSET=1T +export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +#export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true +export FORT_BLOCKSIZE=16777216 + +ulimit -s unlimited + +export LIBRARY_PATH=$LD_LIBRARY_PATH +echo "LD_LIBRARY_PATH = $LD_LIBRARY_PATH" + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" +echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" + +time mpiexec -np ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +time mpiexec -np ${MPI_PROCESS} ${SOLVER_EXE} + +echo "finished successfully" +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_juwels_test_case_C.slurm b/specfem3d/job_script/job_supermuc-ng_test_case_B.slurm similarity index 52% rename from specfem3d/job_script/job_juwels_test_case_C.slurm rename to specfem3d/job_script/job_supermuc-ng_test_case_B.slurm index c4e545d8689862ceb3e6b5bdeaf4f9219027f51c..6a36b994a6f4378b1f8f5286e5771dcac4bb4f43 100644 --- a/specfem3d/job_script/job_juwels_test_case_C.slurm +++ b/specfem3d/job_script/job_supermuc-ng_test_case_B.slurm @@ -1,29 +1,33 @@ -#!/bin/bash -x -#SBATCH -J Test_case_C -#SBATCH --account=prpb66 -#SBATCH --nodes=1 -#SBATCH --ntasks-per-node=6 -#SBATCH --cpus-per-task=8 -#SBATCH --time=00:30:00 -#SBATCH --partition=batch -#SBATCH --output=specfem_juwels_TestCaseC-%j.output -#SBATCH --acctg-freq=task=1 -set -e +#!/bin/bash +#SBATCH -J Test_case_B +#SBATCH --nodes=384 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=12 +#SBATCH --time=00:29:59 +#SBATCH --no-requeue +#SBATCH --account=pn68go +#SBATCH --partition=general # insert test, micro, general, large or fat +#SBATCH -o ./%x-12OMP.%j.out +#set -e +source ../env/env_supermuc-ng +cat ../env/env_supermuc-ng +cat job_supermuc-ng_test_case_A.slurm +module load slurm_setup -source ../env/env_juwels -cd $install_dir/TestCaseC/specfem3d_globe - -export I_MPI_DOMAIN=auto -export I_MPI_PIN_RESPECT_CPUSET=0 -export I_MPI_DEBUG=4 +cd $install_dir/TestCaseB/specfem3d_globe #Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task export KMP_HW_SUBSET=1T -export OMP_NUM_THREADS=8 -#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} +#export KMP_AFFINITY=granularity=core,compact +export FORT_BUFFERED=true +export FORT_BLOCKSIZE=16777216 ulimit -s unlimited +export LIBRARY_PATH=$LD_LIBRARY_PATH +echo "LD_LIBRARY_PATH = $LD_LIBRARY_PATH" + MESHER_EXE=./bin/xmeshfem3D SOLVER_EXE=./bin/xspecfem3D @@ -42,15 +46,14 @@ echo `date` echo "starting MPI mesher" echo -MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK echo "SLURM_NNODES=" $SLURM_NNODES echo "MPI_PROCESS $MPI_PROCESS" echo "OMP_NUM_THREADS=$OMP_NUM_THREADS" -#time mpirun -n ${MPI_PROCESS} ${MESHER_EXE} -time srun -n ${MPI_PROCESS} ${MESHER_EXE} +time mpiexec -np ${MPI_PROCESS} ${MESHER_EXE} echo " mesher done: `date`" echo @@ -63,8 +66,7 @@ echo echo `date` echo starting run in current directory $PWD echo -#time mpirun -n ${MPI_PROCESS} ${SOLVER_EXE} -time srun -n ${MPI_PROCESS} ${SOLVER_EXE} +time mpiexec -np ${MPI_PROCESS} ${SOLVER_EXE} echo "finished successfully" echo `date` diff --git a/specfem3d/job_script/job_vega-cpu_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_vega-cpu_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..4062fcdde6a14c8e7864aa8d31a001f2e0fcbe01 --- /dev/null +++ b/specfem3d/job_script/job_vega-cpu_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,62 @@ +#!/bin/bash +#SBATCH -J specfem_cpu_small_benchmark_run_to_test_more_complex_Earth +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=24 +#SBATCH --cpus-per-task=4 +#SBATCH --time=11:59:59 +#SBATCH --output specfem_cpu_small_benchmark_run_to_test_more_complex_Earth-%j.out +#SBATCH --exclusive +#SBATCH -p cpu +#set -e +source ../env/env_vega-cpu +grep "^[^#;]" ../env/env_vega-cpu +cat job_vega-cpu_small_benchmark_run_to_test_more_complex_Earth.slurm +rm $install_dir/specfem3d_globe/Makefile $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/bin/* +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth + +#export OMPI_MCA_pml=ucx +#export OMPI_MCA_btl="^uct,tcp,openib,vader" +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK +#sed -i s/"mpirun -np"/"srun -n"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_mesher_solver.bash +taskset -a -p $PPID + +time ./run_this_example.sh + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log + +echo +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +# uncompress seismograms +if [ -e EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/II.AAK.MXE.sem.ascii.bz2 ]; then + echo + echo "unzipping references..." + echo + mkdir OUTPUT_FILES_reference_OK/ + bunzip2 EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/*.bz2 + echo + echo +fi + +#wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh +#sh /ceph/hpc/home/eucedricj/Miniconda3-py37_4.10.3-Linux-x86_64.sh +#source miniconda3/bin/activate +#conda create --name python2 python=2.7 +module purge +conda activate python2 +# compares seismograms by plotting correlations +./utils/compare_seismogram_correlations.py EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/ EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/ + +echo +echo "done" + +cp $install_dir//specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_solver.txt output_solver_$SLURM_JOBID.txt diff --git a/specfem3d/job_script/job_vega-cpu_test_case_A.slurm b/specfem3d/job_script/job_vega-cpu_test_case_A.slurm new file mode 100644 index 0000000000000000000000000000000000000000..6a829e2b012c19aef4bc2f0b0478cdd97a58d219 --- /dev/null +++ b/specfem3d/job_script/job_vega-cpu_test_case_A.slurm @@ -0,0 +1,90 @@ +#!/bin/bash +#SBATCH -J Test_case_A-cpu +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=8 +#SBATCH --time=00:30:00 +#SBATCH --output specfem-cpu_TestCaseA-gcc-9-8-openMP-nomultithread-Ofast-znver2-OMP_PLACESCores-mpirun-withMCA-%j.output +#SBATCH -p cpu +#SBATCH --hint=nomultithread +#SBATCH --distribution=block:block +#set -e +source ../env/env_vega-cpu +grep "^[^#;]" ../env/env_vega-cpu +cat job_vega-cpu_test_case_A.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log +cd $install_dir/TestCaseA/specfem3d_globe + +#export SLURM_CPU_BIND=NONE +export OMPI_MCA_pml=ucx +export OMPI_MCA_btl="^uct,tcp,openib,vader" #self,vader,openib" # with ^ucx and ^tcp -> error occurred in MPI_Bcast + +#Make sure that OMP_NUM_THREADS / KMP_HW_SUBSET = cpus-per-task +#export KMP_HW_SUBSET=2T +export OMP_PLACES=cores #sockets +#export OMP_SCHEDULE=DYNAMIC +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +echo $LD_LIBRARY_PATH +ldd /exa5/scratch/user/eucedricj//benchmarks/vega-cpu/specfem3d_globe/31octobre/cpu-znver2/TestCaseA/specfem3d_globe/bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" + +#time mpirun --display-devel-map -n ${MPI_PROCESS} ${MESHER_EXE} +time mpirun -n ${MPI_PROCESS} ${MESHER_EXE} +#time srun --mpi=pmix_v3 --cpu-bind=core -n ${MPI_PROCESS} ${MESHER_EXE} # +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +#unset FORT_BUFFERED +#time mpirun --display-devel-map -n ${MPI_PROCESS} ${SOLVER_EXE} +time mpirun -n ${MPI_PROCESS} ${SOLVER_EXE} +#time srun --mpi=pmix_v3 --cpu-bind=core -n ${MPI_PROCESS} ${SOLVER_EXE} + +echo "finished successfully" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm b/specfem3d/job_script/job_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm new file mode 100644 index 0000000000000000000000000000000000000000..e40960743e68c47c3f72770f36d6dc865c38bf4b --- /dev/null +++ b/specfem3d/job_script/job_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm @@ -0,0 +1,74 @@ +#!/bin/bash +#SBATCH -J specfem_gpu_small_benchmark_run_to_test_more_complex_Earth +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=24 +#SBATCH --cpus-per-task=2 +#SBATCH --time=01:59:59 +#SBATCH --output specfem_gpu_small_benchmark_run_to_test_more_complex_Earth-%j.out +#SBATCH --exclusive +#SBATCH -p gpu +#SBATCH --gres=gpu:4 +#set -e +source ../env/env_vega-gpu +grep "^[^#;]" ../env/env_vega-gpu +cat job_vega-gpu_small_benchmark_run_to_test_more_complex_Earth.slurm + +cd $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth +export OMPI_MCA_pml=ucx +export OMPI_MCA_btl="^uct,tcp,openib,vader" +export CUDA_VISIBLE_DEVICES=0,1,2,3 +#export OMP_NUM_THREADS=1 + +sed -i s/"GPU_MODE = .false."/"GPU_MODE = .true."/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"GPU_DEVICE = Tesla"/"GPU_DEVICE = *"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/DATA/Par_file +sed -i s/"configure --enable-openmp"/"configure --build=ppc64 --with-cuda=cuda8 "/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_this_example.sh +#sed -i s/"mpirun -np"/"srun -n"/g $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/run_mesher_solver.bash +#taskset -a -p $PPID + +time ./run_this_example.sh + +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/specfem3d_globe/make.log + +echo +echo "running seismogram comparisons:" +echo + +cd $install_dir/specfem3d_globe/ +# uncompress seismograms +if [ -e EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/II.AAK.MXE.sem.ascii.bz2 ]; then + echo + echo "unzipping references..." + echo + mkdir OUTPUT_FILES_reference_OK/ + bunzip2 EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/*.bz2 + echo + echo +fi + +#wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh +#sh /ceph/hpc/home/eucedricj/Miniconda3-py37_4.10.3-Linux-x86_64.sh +#source miniconda3/bin/activate +#conda create --name python2 python=2.7 +module purge +echo "which conda" +which conda +echo "conda init bash" +conda init bash +/ceph/hpc/home/eucedricj/miniconda3/bin/activate +echo "conda activate python2" +conda activate python2 +# compares seismograms by plotting correlations +./utils/compare_seismogram_correlations.py EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/ EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES_reference_OK/ + +echo +echo "done" +ls -lrth $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_*.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_mesher.txt +cat $install_dir/specfem3d_globe/EXAMPLES/small_benchmark_run_to_test_more_complex_Earth/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/job_script/job_vega-gpu_test_case_A.slurm b/specfem3d/job_script/job_vega-gpu_test_case_A.slurm new file mode 100644 index 0000000000000000000000000000000000000000..80be291c64597718215eef523c9cb0b6f10aecad --- /dev/null +++ b/specfem3d/job_script/job_vega-gpu_test_case_A.slurm @@ -0,0 +1,80 @@ +#!/bin/bash +#SBATCH -J Test_case_A-gpu +#SBATCH --nodes=24 +#SBATCH --ntasks-per-node=4 +#SBATCH --cpus-per-task=8 +#SBATCH --time=00:30:00 +#SBATCH --output specfem-gpu_TestCaseA-gcc-9-cuda-11-GPU-NoopenMP-NoMultithread-8cpus-distribBlock-znver2-%j.output +#SBATCH -p gpu +#SBATCH --gres=gpu:4 +#SBATCH --hint=nomultithread +#SBATCH --distribution=block:block +#set -e +source ../env/env_vega-gpu +grep "^[^#;]" ../env/env_vega-gpu +cat job_vega-gpu_test_case_A.slurm +echo "==========" +echo "config.log" +echo "==========" +cat $install_dir/TestCaseA/specfem3d_globe/config.log +echo "========" +echo "make.log" +echo "========" +cat $install_dir/TestCaseA/specfem3d_globe/make.log +cd $install_dir/TestCaseA/specfem3d_globe +grep GPU DATA/Par_file + +export OMPI_MCA_pml=ucx +export OMPI_MCA_btl="^uct,tcp,openib,vader" #self,vader,openib" # with ^ucx and ^tcp -> error occurred in MPI_Bcast +export CUDA_VISIBLE_DEVICES=0,1,2,3 +#export OMP_NUM_THREADS=2 + +ulimit -s unlimited + +MESHER_EXE=./bin/xmeshfem3D +SOLVER_EXE=./bin/xspecfem3D + +# backup files used for this simulation +cp DATA/Par_file OUTPUT_FILES/ +cp DATA/STATIONS OUTPUT_FILES/ +cp DATA/CMTSOLUTION OUTPUT_FILES/ + +## +## mesh generation +## +sleep 2 + +echo +echo `date` +echo "starting MPI mesher" +echo + + +MPI_PROCESS=` echo "$SLURM_NNODES*$SLURM_NTASKS_PER_NODE" | bc -l` +echo "SLURM_NTASKS_PER_NODE = " $SLURM_NTASKS_PER_NODE +echo "SLURM_CPUS_PER_TASKS = " $SLURM_CPUS_PER_TASK +echo "SLURM_NNODES=" $SLURM_NNODES +echo "MPI_PROCESS $MPI_PROCESS" + +time mpirun -n ${MPI_PROCESS} ${MESHER_EXE} +echo " mesher done: `date`" +echo + +## +## forward simulation +## +sleep 2 + +echo +echo `date` +echo starting run in current directory $PWD +echo +#unset FORT_BUFFERED +time mpirun -n ${MPI_PROCESS} ${SOLVER_EXE} + +echo "finished successfully" +echo "=====================" +echo `date` +ls -lrth $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_*.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_mesher.txt +cat $install_dir/TestCaseA/specfem3d_globe/OUTPUT_FILES/output_solver.txt diff --git a/specfem3d/test_cases/SPECFEM3D_TestCaseC/CMTSOLUTION b/specfem3d/test_cases/SPECFEM3D_TestCaseC/CMTSOLUTION deleted file mode 100644 index ac4a849cc037978d4372d14815b90fd7aa569de8..0000000000000000000000000000000000000000 --- a/specfem3d/test_cases/SPECFEM3D_TestCaseC/CMTSOLUTION +++ /dev/null @@ -1,13 +0,0 @@ -PDE 1994 6 9 0 33 16.40 -13.8300 -67.5600 637.0 6.9 6.8 NORTHERN BOLIVIA -event name: 060994A -time shift: 29.0000 -half duration: 20.0000 -latitude: -13.8200 -longitude: -67.2500 -depth: 647.1000 -Mrr: -7.590000e+27 -Mtt: 7.750000e+27 -Mpp: -1.600000e+26 -Mrt: -2.503000e+28 -Mrp: 4.200000e+26 -Mtp: -2.480000e+27 diff --git a/specfem3d/test_cases/SPECFEM3D_TestCaseC/Par_file b/specfem3d/test_cases/SPECFEM3D_TestCaseC/Par_file deleted file mode 100644 index fa044a9370ae27606d1e5f4973cd9d2aa4aeb14f..0000000000000000000000000000000000000000 --- a/specfem3d/test_cases/SPECFEM3D_TestCaseC/Par_file +++ /dev/null @@ -1,336 +0,0 @@ -#----------------------------------------------------------- -# -# Simulation input parameters -# -#----------------------------------------------------------- - -# forward or adjoint simulation -SIMULATION_TYPE = 1 # set to 1 for forward simulations, 2 for adjoint simulations for sources, and 3 for kernel simulations -NOISE_TOMOGRAPHY = 0 # flag of noise tomography, three steps (1,2,3). If earthquake simulation, set it to 0. -SAVE_FORWARD = .false. # save last frame of forward simulation or not - -# number of chunks (1,2,3 or 6) -NCHUNKS = 6 - -# angular width of the first chunk (not used if full sphere with six chunks) -ANGULAR_WIDTH_XI_IN_DEGREES = 90.d0 # angular size of a chunk -ANGULAR_WIDTH_ETA_IN_DEGREES = 90.d0 -CENTER_LATITUDE_IN_DEGREES = 90.d0 -CENTER_LONGITUDE_IN_DEGREES = 0.d0 -GAMMA_ROTATION_AZIMUTH = 0.d0 - -# number of elements at the surface along the two sides of the first chunk -# (must be multiple of 16 and 8 * multiple of NPROC below) -NEX_XI = 64 -NEX_ETA = 64 - -# number of MPI processors along the two sides of the first chunk -NPROC_XI = 1 -NPROC_ETA = 1 - -#----------------------------------------------------------- -# -# Model -# -#----------------------------------------------------------- - -# 1D models with real structure: -# 1D_isotropic_prem, 1D_transversely_isotropic_prem, 1D_iasp91, 1D_1066a, 1D_ak135f_no_mud, 1D_ref, 1D_ref_iso, 1D_jp3d,1D_sea99 -# -# 1D models with only one fictitious averaged crustal layer: -# 1D_isotropic_prem_onecrust, 1D_transversely_isotropic_prem_onecrust, 1D_iasp91_onecrust, 1D_1066a_onecrust, 1D_ak135f_no_mud_onecrust -# -# fully 3D models: -# transversely_isotropic_prem_plus_3D_crust_2.0, 3D_anisotropic, 3D_attenuation, -# s20rts, s40rts, s362ani, s362iso, s362wmani, s362ani_prem, s362ani_3DQ, s362iso_3DQ, -# s29ea, s29ea,sea99_jp3d1994,sea99,jp3d1994,heterogen,full_sh -# -# 3D models with 1D crust: append "_1Dcrust" the the 3D model name -# to take the 1D crustal model from the -# associated reference model rather than the default 3D crustal model -# e.g. s20rts_1Dcrust, s362ani_1Dcrust, etc. -MODEL = s362ani - -# parameters describing the Earth model -OCEANS = .true. -ELLIPTICITY = .true. -TOPOGRAPHY = .true. -GRAVITY = .true. -ROTATION = .true. -ATTENUATION = .true. - -# absorbing boundary conditions for a regional simulation -ABSORBING_CONDITIONS = .false. - -# record length in minutes -RECORD_LENGTH_IN_MINUTES = 1d0 - -# to undo attenuation for sensitivity kernel calculations or forward runs with SAVE_FORWARD -# use one (and only one) of the two flags below. UNDO_ATTENUATION is much better (it is exact) -# but requires a significant amount of disk space for temporary storage. -PARTIAL_PHYS_DISPERSION_ONLY = .false. -UNDO_ATTENUATION = .false. -# How much memory (in GB) is installed on your machine per CPU core (only used for UNDO_ATTENUATION, can be ignored otherwise) -# (or per GPU card or per INTEL MIC Phi board) -# Beware, this value MUST be given per core, i.e. per MPI thread, i.e. per MPI rank, NOT per node. -# This value is for instance: -# - 4 GB on Tiger at Princeton -# - 4 GB on TGCC Curie in Paris -# - 4 GB on Titan at ORNL when using CPUs only (no GPUs); start your run with "aprun -n$NPROC -N8 -S4 -j1" -# - 2 GB on the machine used by Christina Morency -# - 2 GB on the TACC machine used by Min Chen -# - 1.5 GB on the GPU cluster in Marseille -# When running on GPU machines, it is simpler to set PERCENT_OF_MEM_TO_USE_PER_CORE = 100.d0 -# and then set MEMORY_INSTALLED_PER_CORE_IN_GB to the amount of memory that you estimate is free (rather than installed) -# on the host of the GPU card while running your GPU job. -# For GPU runs on Titan at ORNL, use PERCENT_OF_MEM_TO_USE_PER_CORE = 100.d0 and MEMORY_INSTALLED_PER_CORE_IN_GB = 25.d0 -# and run your job with "aprun -n$NPROC -N1 -S1 -j1" -# (each host has 32 GB on Titan, each GPU has 6 GB, thus even if all the GPU arrays are duplicated on the host -# this leaves 32 - 6 = 26 GB free on the host; leaving 1 GB for the Linux system, we can safely use 100% of 25 GB) -MEMORY_INSTALLED_PER_CORE_IN_GB = 16.0d0 -# What percentage of this total do you allow us to use for arrays to undo attenuation, keeping in mind that you -# need to leave some memory available for the GNU/Linux system to run -# (a typical value is 85%; any value below is fine but the code will then save a lot of data to disk; -# values above, say 90% or 92%, can be OK on some systems but can make the adjoint code run out of memory -# on other systems, depending on how much memory per node the GNU/Linux system needs for itself; thus you can try -# a higher value and if the adjoint crashes then try again with a lower value) -PERCENT_OF_MEM_TO_USE_PER_CORE = 85.d0 - -# three mass matrices instead of one are needed to handle rotation very accurately; -# otherwise rotation is handled slightly less accurately (but still reasonably well); -# set to .true. if you are interested in precise effects related to rotation; -# set to .false. if you are solving very large inverse problems at high frequency and also undoing attenuation exactly -# using the UNDO_ATTENUATION flag above, in which case saving as much memory as possible can be a good idea. -# You can also safely set it to .false. if you are not in a period range in which rotation matters, e.g. if you are targetting very short-period body waves. -# if in doubt, set to .true. -# Set it to .true. if you have ABSORBING_CONDITIONS above, because in that case the code will use the three mass matrices anyway -# and thus there is no additional cost. -# this flag is of course unused if ROTATION above is set to .false. -EXACT_MASS_MATRIX_FOR_ROTATION = .false. - -#----------------------------------------------------------- - -# this for LDDRK high-order time scheme instead of Newmark -USE_LDDRK = .false. - -# the maximum CFL of LDDRK is significantly higher than that of the Newmark scheme, -# in a ratio that is theoretically 1.327 / 0.697 = 1.15 / 0.604 = 1.903 for a solid with Poisson's ratio = 0.25 -# and for a fluid (see the manual of the 2D code, SPECFEM2D, Tables 4.1 and 4.2, and that ratio does not -# depend on whether we are in 2D or in 3D). However in practice a ratio of about 1.5 to 1.7 is often safer -# (for instance for models with a large range of Poisson's ratio values). -# Since the code computes the time step using the Newmark scheme, for LDDRK we will simply -# multiply that time step by this ratio when LDDRK is on and when flag INCREASE_CFL_FOR_LDDRK is true. -INCREASE_CFL_FOR_LDDRK = .true. -RATIO_BY_WHICH_TO_INCREASE_IT = 1.5d0 - -#----------------------------------------------------------- -# -# Visualization -# -#----------------------------------------------------------- - -# save AVS or OpenDX movies -#MOVIE_COARSE saves movie only at corners of elements (SURFACE OR VOLUME) -#MOVIE_COARSE does not work with create_movie_AVS_DX -MOVIE_SURFACE = .false. -MOVIE_VOLUME = .false. -MOVIE_COARSE = .false. -NTSTEP_BETWEEN_FRAMES = 100 -HDUR_MOVIE = 0.d0 - -# save movie in volume. Will save element if center of element is in prescribed volume -# top/bottom: depth in KM, use MOVIE_TOP = -100 to make sure the surface is stored. -# west/east: longitude, degrees East [-180/180] top/bottom: latitute, degrees North [-90/90] -# start/stop: frames will be stored at MOVIE_START + i*NSTEP_BETWEEN_FRAMES, where i=(0,1,2..) and iNSTEP_BETWEEN_FRAMES <= MOVIE_STOP -# movie_volume_type: 1=strain, 2=time integral of strain, 3=\mu*time integral of strain -# type 4 saves the trace and deviatoric stress in the whole volume, 5=displacement, 6=velocity -MOVIE_VOLUME_TYPE = 2 -MOVIE_TOP_KM = -100.0 -MOVIE_BOTTOM_KM = 1000.0 -MOVIE_WEST_DEG = -90.0 -MOVIE_EAST_DEG = 90.0 -MOVIE_NORTH_DEG = 90.0 -MOVIE_SOUTH_DEG = -90.0 -MOVIE_START = 0 -MOVIE_STOP = 40000 - -# save mesh files to check the mesh -SAVE_MESH_FILES = .false. - -# restart files (number of runs can be 1 or higher, choose 1 for no restart files) -NUMBER_OF_RUNS = 1 -NUMBER_OF_THIS_RUN = 1 - -# path to store the local database files on each node -LOCAL_PATH = ./DATABASES_MPI -# temporary wavefield/kernel/movie files -LOCAL_TMP_PATH = ./DATABASES_MPI - -# interval at which we output time step info and max of norm of displacement -NTSTEP_BETWEEN_OUTPUT_INFO = 50 - -#----------------------------------------------------------- -# -# Sources & seismograms -# -#----------------------------------------------------------- - -# interval in time steps for temporary writing of seismograms -NTSTEP_BETWEEN_OUTPUT_SEISMOS = 2000 -NTSTEP_BETWEEN_READ_ADJSRC = 1000 - -# use a (tilted) FORCESOLUTION force point source (or several) instead of a CMTSOLUTION moment-tensor source. -# # This can be useful e.g. for asteroid simulations -# # in which the source is a vertical force, normal force, tilted force, impact etc. -# # If this flag is turned on, the FORCESOLUTION file must be edited by giving: -# # - the corresponding time-shift parameter, -# # - the half duration parameter of the source, -# # - the coordinates of the source, -# # - the source time function of the source, -# # - the magnitude of the force source, -# # - the components of a (non necessarily unitary) direction vector for the force source in the E/N/Z_UP basis. -# # The direction vector is made unitary internally in the code and thus only its direction matters here; -# # its norm is ignored and the norm of the force used is the factor force source times the source time function. -USE_FORCE_POINT_SOURCE = .false. - -# option to save strain seismograms -# this option is useful for strain Green's tensor -# this feature is currently under development -SAVE_SEISMOGRAMS_STRAIN = .false. - -# save seismograms also when running the adjoint runs for an inverse problem -# (usually they are unused and not very meaningful, leave this off in almost all cases) -SAVE_SEISMOGRAMS_IN_ADJOINT_RUN = .false. - -# output format for the seismograms (one can use either or all of the three formats) -OUTPUT_SEISMOS_ASCII_TEXT = .true. -OUTPUT_SEISMOS_SAC_ALPHANUM = .false. -OUTPUT_SEISMOS_SAC_BINARY = .false. -OUTPUT_SEISMOS_ASDF = .false. - -# rotate seismograms to Radial-Transverse-Z or use default North-East-Z reference frame -ROTATE_SEISMOGRAMS_RT = .false. - -# decide if master process writes all the seismograms or if all processes do it in parallel -WRITE_SEISMOGRAMS_BY_MASTER = .true. - -# save all seismograms in one large combined file instead of one file per seismogram -# to avoid overloading shared non-local file systems such as LUSTRE or GPFS for instance -SAVE_ALL_SEISMOS_IN_ONE_FILE = .false. -USE_BINARY_FOR_LARGE_FILE = .false. - -# flag to impose receivers at the surface or allow them to be buried -RECEIVERS_CAN_BE_BURIED = .true. - -# print source time function -PRINT_SOURCE_TIME_FUNCTION = .false. - -#----------------------------------------------------------- -# -# Adjoint kernel outputs -# -#----------------------------------------------------------- - -# use ASDF format for reading the adjoint sources -READ_ADJSRC_ASDF = .false. - -# this parameter must be set to .true. to compute anisotropic kernels -# in crust and mantle (related to the 21 Cij in geographical coordinates) -# default is .false. to compute isotropic kernels (related to alpha and beta) -ANISOTROPIC_KL = .false. - -# output only transverse isotropic kernels (alpha_v,alpha_h,beta_v,beta_h,eta,rho) -# rather than fully anisotropic kernels when ANISOTROPIC_KL above is set to .true. -# means to save radial anisotropic kernels, i.e., sensitivity kernels for beta_v, beta_h, etc. -SAVE_TRANSVERSE_KL_ONLY = .false. - -# output approximate Hessian in crust mantle region. -# means to save the preconditioning for gradients, they are cross correlations between forward and adjoint accelerations. -APPROXIMATE_HESS_KL = .false. - -# forces transverse isotropy for all mantle elements -# (default is to use transverse isotropy only between MOHO and 220) -# means we allow radial anisotropy between the bottom of the crust to the bottom of the transition zone, i.e., 660~km depth. -USE_FULL_TISO_MANTLE = .false. - -# output kernel mask to zero out source region -# to remove large values near the sources in the sensitivity kernels -SAVE_SOURCE_MASK = .false. - -# output kernels on a regular grid instead of on the GLL mesh points (a bit expensive) -SAVE_REGULAR_KL = .false. - -#----------------------------------------------------------- - -# Dimitri Komatitsch, July 2014, CNRS Marseille, France: -# added the ability to run several calculations (several earthquakes) -# in an embarrassingly-parallel fashion from within the same run; -# this can be useful when using a very large supercomputer to compute -# many earthquakes in a catalog, in which case it can be better from -# a batch job submission point of view to start fewer and much larger jobs, -# each of them computing several earthquakes in parallel. -# To turn that option on, set parameter NUMBER_OF_SIMULTANEOUS_RUNS to a value greater than 1. -# To implement that, we create NUMBER_OF_SIMULTANEOUS_RUNS MPI sub-communicators, -# each of them being labeled "my_local_mpi_comm_world", and we use them -# in all the routines in "src/shared/parallel.f90", except in MPI_ABORT() because in that case -# we need to kill the entire run. -# When that option is on, of course the number of processor cores used to start -# the code in the batch system must be a multiple of NUMBER_OF_SIMULTANEOUS_RUNS, -# all the individual runs must use the same number of processor cores, -# which as usual is NPROC in the Par_file, -# and thus the total number of processor cores to request from the batch system -# should be NUMBER_OF_SIMULTANEOUS_RUNS * NPROC. -# All the runs to perform must be placed in directories called run0001, run0002, run0003 and so on -# (with exactly four digits). -# -# Imagine you have 10 independent calculations to do, each of them on 100 cores; you have three options: -# -# 1/ submit 10 jobs to the batch system -# -# 2/ submit a single job on 1000 cores to the batch, and in that script create a sub-array of jobs to start 10 jobs, -# each running on 100 cores (see e.g. http://www.schedmd.com/slurmdocs/job_array.html ) -# -# 3/ submit a single job on 1000 cores to the batch, start SPECFEM3D on 1000 cores, create 10 sub-communicators, -# cd into one of 10 subdirectories (called e.g. run0001, run0002,... run0010) depending on the sub-communicator -# your MPI rank belongs to, and run normally on 100 cores using that sub-communicator. -# -# The option below implements 3/. -# -NUMBER_OF_SIMULTANEOUS_RUNS = 1 - -# if we perform simultaneous runs in parallel, if only the source and receivers vary between these runs -# but not the mesh nor the model (velocity and density) then we can also read the mesh and model files -# from a single run in the beginning and broadcast them to all the others; for a large number of simultaneous -# runs for instance when solving inverse problems iteratively this can DRASTICALLY reduce I/Os to disk in the solver -# (by a factor equal to NUMBER_OF_SIMULTANEOUS_RUNS), and reducing I/Os is crucial in the case of huge runs. -# Thus, always set this option to .true. if the mesh and the model are the same for all simultaneous runs. -# In that case there is no need to duplicate the mesh and model file database (the content of the DATABASES_MPI -# directories) in each of the run0001, run0002,... directories, it is sufficient to have one in run0001 -# and the code will broadcast it to the others) -BROADCAST_SAME_MESH_AND_MODEL = .false. - -# if one or a few of these simultaneous runs fail, kill all the runs or let the others finish using a fail-safe mechanism -# (in most cases, should be set to false) -USE_FAILSAFE_MECHANISM = .false. - -#----------------------------------------------------------- - -# set to true to use GPUs -GPU_MODE = .false. -# Only used if GPU_MODE = .true. : -GPU_RUNTIME = 1 -# 2 (OpenCL), 1 (Cuda) ou 0 (Compile-time -- does not work if configured with --with-cuda *AND* --with-opencl) -GPU_PLATFORM = NVIDIA -GPU_DEVICE = Tesla - -# set to true to use the ADIOS library for I/Os -ADIOS_ENABLED = .false. -ADIOS_FOR_FORWARD_ARRAYS = .true. -ADIOS_FOR_MPI_ARRAYS = .true. -ADIOS_FOR_ARRAYS_SOLVER = .true. -ADIOS_FOR_SOLVER_MESHFILES = .true. -ADIOS_FOR_AVS_DX = .true. -ADIOS_FOR_KERNELS = .true. -ADIOS_FOR_MODELS = .true. -ADIOS_FOR_UNDO_ATTENUATION = .true. - diff --git a/specfem3d/test_cases/SPECFEM3D_TestCaseC/STATIONS b/specfem3d/test_cases/SPECFEM3D_TestCaseC/STATIONS deleted file mode 100644 index afa5558b024506bd545059dde998dbb2ebc17fdf..0000000000000000000000000000000000000000 --- a/specfem3d/test_cases/SPECFEM3D_TestCaseC/STATIONS +++ /dev/null @@ -1,129 +0,0 @@ -AAK II 42.6390 74.4940 1645.0 30.0 -ABKT II 37.9304 58.1189 678.0 7.0 -ABPO II -19.0180 47.2290 1528.0 5.3 -ALE II 82.5033 -62.3500 60.0 0.0 -ARU II 56.4302 58.5625 250.0 0.0 -ASCN II -7.9327 -14.3601 173.0 100.0 -BFO II 48.3319 8.3311 589.0 0.0 -BORG II 64.7474 -21.3268 110.0 95.0 -BRVK II 53.0581 70.2828 330.0 15.0 -CMLA II 37.7637 -25.5243 429.0 7.0 -COCO II -12.1901 96.8349 1.0 70.0 -DGAR II -7.4121 72.4525 1.0 2.0 -EFI II -51.6753 -58.0637 110.0 80.0 -ERM II 42.0150 143.1572 40.0 0.0 -ESK II 55.3167 -3.2050 242.0 0.0 -FFC II 54.7250 -101.9783 338.0 0.0 -GAR II 39.0000 70.3167 1300.0 0.0 -HOPE II -54.2836 -36.4879 20.0 0.0 -JTS II 10.2908 -84.9525 340.0 0.0 -KAPI II -5.0142 119.7517 300.0 100.0 -KDAK II 57.7828 -152.5835 152.0 5.5 -KIV II 43.9562 42.6888 1210.0 0.0 -KURK II 50.7154 78.6202 184.0 25.0 -KWAJ II 8.8019 167.6130 0.0 0.0 -LVZ II 67.8979 34.6514 630.0 200.0 -MBAR II -0.6019 30.7382 1390.0 100.0 -MSEY II -4.6737 55.4792 475.0 91.0 -MSVF II -17.7448 178.0528 801.1 100.0 -NIL II 33.6506 73.2686 629.0 68.0 -NNA II -11.9875 -76.8422 575.0 40.0 -NRIL II 69.5049 88.4414 92.0 506.0 -NVS II 54.8404 83.2346 150.0 0.0 -OBN II 55.1146 36.5674 160.0 30.0 -PALK II 7.2728 80.7022 460.0 90.0 -RAYN II 23.5225 45.5032 631.0 2.0 -RPN II -27.1267 -109.3344 110.0 0.0 -SACV II 14.9702 -23.6085 387.0 97.0 -SHEL II -15.9588 -5.7457 537.0 60.0 -SUR II -32.3797 20.8117 1770.0 0.0 -TAU II -42.9099 147.3204 132.0 0.0 -TLY II 51.6807 103.6438 579.0 20.0 -UOSS II 24.9453 56.2042 284.4 0.0 -WRAB II -19.9336 134.3600 366.0 100.0 -XPFO II 33.6107 -116.4555 1280.0 0.0 -AAE IU 9.0292 38.7656 2442.0 0.0 -ADK IU 51.8823 -176.6842 130.0 0.0 -AFI IU -13.9093 -171.7773 705.0 1.0 -ANMO IU 34.9459 -106.4572 1720.0 100.0 -ANTO IU 39.8680 32.7934 895.0 195.0 -BBSR IU 32.3713 -64.6963 -1.3 31.4 -BILL IU 68.0653 166.4531 320.0 0.0 -BOCO IU 4.5869 -74.0432 3137.0 23.0 -CASY IU -66.2792 110.5354 5.0 5.0 -CCM IU 38.0557 -91.2446 171.0 51.0 -CHTO IU 18.8141 98.9443 420.0 0.0 -COLA IU 64.8736 -147.8616 200.0 0.0 -COR IU 44.5855 -123.3046 110.0 0.0 -CTAO IU -20.0882 146.2545 320.0 37.0 -DAV IU 7.0697 125.5791 149.0 1.0 -DWPF IU 28.1103 -81.4327 -132.0 162.0 -FUNA IU -8.5259 179.1966 19.0 1.0 -FURI IU 8.8952 38.6798 2565.0 5.0 -GNI IU 40.1480 44.7410 1509.0 100.0 -GRFO IU 49.6909 11.2203 384.0 116.0 -GUMO IU 13.5893 144.8684 61.0 109.0 -HKT IU 29.9618 -95.8384 -413.0 450.0 -HNR IU -9.4387 159.9475 0.0 100.0 -HRV IU 42.5064 -71.5583 200.0 0.0 -INCN IU 37.4776 126.6239 79.0 1.0 -JOHN IU 16.7329 -169.5292 -37.0 39.0 -KBS IU 78.9154 11.9385 90.0 3.0 -KEV IU 69.7565 27.0035 85.0 15.0 -KIEV IU 50.7012 29.2242 140.0 40.0 -KIP IU 21.4233 -158.0150 37.0 33.0 -KMBO IU -1.1271 37.2525 1930.0 20.0 -KNTN IU -2.7744 -171.7186 18.0 2.0 -KONO IU 59.6491 9.5982 216.0 340.0 -KOWA IU 14.4967 -4.0140 316.0 5.0 -LCO IU -29.0110 -70.7004 2300.0 0.0 -LSZ IU -15.2779 28.1882 1200.0 0.0 -LVC IU -22.6127 -68.9111 2930.0 30.0 -MA2 IU 59.5756 150.7700 337.0 2.0 -MAJO IU 36.5457 138.2041 405.0 0.0 -MAKZ IU 46.8080 81.9770 590.0 10.0 -MBWA IU -21.1590 119.7313 181.0 9.0 -MIDW IU 28.2155 -177.3697 17.8 1.0 -MSKU IU -1.6557 13.6116 287.0 25.0 -NAI IU -1.2739 36.8037 1692.0 0.0 -NWAO IU -32.9277 117.2390 370.9 9.1 -OTAV IU 0.2376 -78.4508 3495.0 15.0 -PAB IU 39.5446 -4.3499 950.0 0.0 -PAYG IU -0.6742 -90.2861 170.0 100.0 -PET IU 53.0233 158.6499 105.0 5.0 -PMG IU -9.4047 147.1597 90.0 0.0 -PMSA IU -64.7744 -64.0489 40.0 0.0 -POHA IU 19.7573 -155.5326 1910.0 80.0 -PTCN IU -25.0713 -130.0953 218.0 2.0 -PTGA IU -0.7308 -59.9666 141.0 96.0 -QSPA IU -89.9289 144.4382 2847.0 3.0 -RAIO IU 46.0403 -122.8851 11.0 0.0 -RAO IU -29.2450 -177.9290 59.5 0.5 -RAR IU -21.2125 -159.7733 -72.0 100.0 -RCBR IU -5.8274 -35.9014 291.0 109.0 -RSSD IU 44.1212 -104.0359 2022.7 67.3 -SAML IU -8.9489 -63.1831 120.0 0.0 -SBA IU -77.8492 166.7572 48.0 2.0 -SDV IU 8.8839 -70.6340 1588.0 32.0 -SFJD IU 66.9961 -50.6208 329.0 1.0 -SJG IU 18.1091 -66.1500 420.0 0.0 -SLBS IU 23.6858 -109.9443 825.0 0.0 -SNZO IU -41.3087 174.7043 20.0 100.0 -SSPA IU 40.6358 -77.8876 170.0 100.0 -TARA IU 1.3549 172.9229 19.0 1.0 -TATO IU 24.9735 121.4971 77.1 82.9 -TBT IU 28.6794 -17.9145 180.0 40.0 -TEIG IU 20.2263 -88.2763 15.0 25.0 -TIXI IU 71.6341 128.8667 40.0 0.0 -TOL IU 39.8814 -4.0485 480.0 0.0 -TRIS IU -37.0681 -12.3152 58.0 2.0 -TRQA IU -38.0568 -61.9787 439.0 101.0 -TSUM IU -19.2022 17.5838 1260.0 0.0 -TUC IU 32.3098 -110.7847 909.0 1.0 -ULN IU 47.8651 107.0532 1610.0 0.0 -WAKE IU 19.2834 166.6520 19.0 1.0 -WCI IU 38.2289 -86.2939 78.0 132.0 -WVT IU 36.1297 -87.8300 170.0 0.0 -XMAS IU 2.0448 -157.4457 19.0 1.0 -YAK IU 62.0310 129.6805 110.0 14.0 -YSS IU 46.9587 142.7604 148.0 2.0 diff --git a/tensorflow/README.md b/tensorflow/README.md index 6f8d7f665f3f962b47816fbdba7fb8d5c9c2dac9..f2dbd5923cef884a97c7bbbb2f7f4641422233fb 100644 --- a/tensorflow/README.md +++ b/tensorflow/README.md @@ -2,7 +2,7 @@ TensorFlow === TensorFlow (https://www.tensorflow.org) is a popular open-source library for symbolic math and linear algebra, with particular optimization for neural-networks-based machine learning workflow. Maintained by Google, it is widely used for research and production in both the academia and the industry. -TensorFlow supports a wide variety of hardware platforms (CPUs, GPUs, TPUs), and can be scaled up to utilize multiple compute devices on a single or multiple compute nodes. The main objective of this benchmark is to profile the scaling behavior of TensorFlow on different hardware, and thereby provide a reference baseline of its performance for different sizes of applications. +TensorFlow supports a wide variety of hardware platforms (CPUs, GPUs, TPUs) and can be scaled up to utilize multiple computing devices on a single or multiple compute nodes. The main objective of this benchmark is to profile the scaling behavior of TensorFlow on different hardware, and thereby provide a reference baseline of its performance for different sizes of applications. DeepGalaxy === @@ -11,15 +11,15 @@ There are many open-source datasets available for benchmarking TensorFlow, such - Website: https://github.com/maxwelltsai/DeepGalaxy - Code download: https://github.com/maxwelltsai/DeepGalaxy - [Prerequisites installation](#prerequisites-installation) -- [Test Case A](Testcase_A/README.md) -- [Test Case B](Testcase_B/README.md) -- [Test Case C](Testcase_C/README.md) +- [Test Case A (small)](Testcase_A/README.md) +- [Test Case B (medium)](Testcase_B/README.md) +- [Test Case C (large)](Testcase_C/README.md) ## Prerequisites Installation -The prerequsities consists of a list of python packages as shown below. It is recommended to create a python virtual environment (either with `pyenv` or `conda`). The following packages can be installed using the `pip` package management tool: +The prerequsities consists of a list of python packages as shown below. It is recommended to create a python virtual environment (either with `pyenv` or `conda`). In general, the following packages can be installed using the `pip` package management tool: ``` pip install tensorflow pip install horovod @@ -27,7 +27,9 @@ pip install scikit-learn pip install scikit-image pip install pandas ``` -Note: there is no guarantee of optimal performance when `tensorflow` is installed using `pip`. It is better if `tensorflow` is compiled from source, in which case the compiler will likely be able to take advantage of the advanced instruction sets supported by the processor (e.g., AVX512). An official build instruction can be found at https://www.tensorflow.org/install/source. Sometimes, an HPC center may have a tensorflow module optimized for their hardware, in which case the `pip install tensorflow` line can be replaced with a line like `module load `. +Note: there is no guarantee of optimal performance when TensorFlow is installed using `pip`. It is better if TensorFlow is compiled from source, in which case the compiler will likely be able to take advantage of the advanced instruction sets supported by the processor (e.g., AVX512). An official build instruction can be found at https://www.tensorflow.org/install/source. Sometimes, an HPC center may have a tensorflow module optimized for their hardware, in which case the `pip install tensorflow` line can be replaced with a line like `module load `. + +For multi-node training, `MPI` and/or `NCCL` should be installed as well. ## How to benchmark the throughput of a HPC system @@ -35,7 +37,7 @@ Note: there is no guarantee of optimal performance when `tensorflow` is installe ``` git clone https://github.com/maxwelltsai/DeepGalaxy.git ``` -This should clone the full benchmark code to a local directory called `DeepGalaxy`. Enter this directory with `cd DeepGalaxy`. +In doing so, the latest version of the `DeepGalaxy` benchmark suite will be downloaded. Note that the latest version is not necessarily the most stable version, and there is no guarantee of backward compatability with older TensorFlow versions. **Step 2**: Download the training dataset. In the `DeepGalaxy` directory, download the training dataset. Depending on the benchmark size, there are three datasets available: @@ -44,28 +46,46 @@ In the `DeepGalaxy` directory, download the training dataset. Depending on the b - (1024, 1024) pixels: https://edu.nl/gcy96 (6.1GB) - (2048, 2048) pixels: https://edu.nl/bruf6 (14GB) -**Step 3**: Run the code on different number of workers. For example, the following command executes the code on `np = 4` workers: +**Step 3**: Run the code on different numbers of workers. For example, the following command executes the code on `np = 4` workers: ``` -mpirun -np 4 dg_train.py -f output_bw_512.hdf5 --epochs 20 --noise 0.1 --batch-size 4 --arch EfficientNetB4 +mpirun -np 4 python dg_train.py -f output_bw_512.hdf5 --epochs 20 --noise 0.1 --batch-size 4 --arch EfficientNetB4 ``` -where `output_bw_512.hdf5` is the training dataset downloaded in the previous step. Please change the file name if necessary. One could also change the other parameters, such as `--epochs`, `--batch-size`, and `--arch` according to the size of the benchmark. For example, the `EfficientNetB0` deep neural network is for small HPC systems, `EfficientNetB4` is for medium-size ones, and `EfficientNetB7` is for large systems. Also, if there are a lot of memory, increasing the `--batch-size` could improve the throughput. If the `--batch-size` parameter is too large, an out-of-memory error could occur. +`output_bw_512.hdf5` is the training dataset downloaded in the previous step. Please change the file name if necessary. One could also change the other parameters, such as `--epochs`, `--batch-size`, and `--arch` according to the size of the benchmark. For example, the `EfficientNetB0` deep neural network is for small HPC systems, `EfficientNetB4` is for medium-size ones, and `EfficientNetB7` is for large systems. Also, shoudl the system memory permits, increasing the `--batch-size` could improve the throughput. If the `--batch-size` parameter is too large, an out-of-memory error could occur. -It is wise to save the output of the `mpirun` command to a text file, for example, `DeepGalaxy.np_4.out`. +The benchmark data of the training are written to the file `train_log.txt`. **Step 4**: Repeat Step 3 with different `np`. -All the desired `np` settings are completed, we should have a bunch of output files on the local directory. For example, `DeepGalaxy.np_4.out`, `DeepGalaxy.np_8.out`, and so on. We could then extract the throughput using the following command: +By the time when all the desired `np` settings are completed, we should have all the throughput data written in the `train_log.txt` file. The content of the file looks like this: ``` -grep sample DeepGalaxy.np_4.out +Time is now 2021-06-21 14:26:08.581012 + +Parallel training enabled. +batch_size = 4, global_batch_size = 16, num_workers = 4 + +hvd_rank = 0, hvd_local_rank = 0 +Loading part of the dataset since distributed training is enabled ... +Shape of X: (319, 512, 512, 1) +Shape of Y: (319,) +Number of classes: 213 +[Performance] Epoch 0 takes 107.60 seconds. Throughput: 2.37 images/sec (per worker), 9.48 images/sec (total) +[Performance] Epoch 1 takes 17.15 seconds. Throughput: 14.87 images/sec (per worker), 59.47 images/sec (total) +[Performance] Epoch 2 takes 10.95 seconds. Throughput: 23.29 images/sec (per worker), 93.15 images/sec (total) +[Performance] Epoch 3 takes 10.99 seconds. Throughput: 23.21 images/sec (per worker), 92.82 images/sec (total) +[Performance] Epoch 4 takes 11.01 seconds. Throughput: 23.17 images/sec (per worker), 92.67 images/sec (total) +[Performance] Epoch 5 takes 11.00 seconds. Throughput: 23.18 images/sec (per worker), 92.72 images/sec (total) +[Performance] Epoch 6 takes 11.05 seconds. Throughput: 23.08 images/sec (per worker), 92.31 images/sec (total) +[Performance] Epoch 7 takes 11.16 seconds. Throughput: 22.86 images/sec (per worker), 91.44 images/sec (total) +[Performance] Epoch 8 takes 11.11 seconds. Throughput: 22.96 images/sec (per worker), 91.85 images/sec (total) +[Performance] Epoch 9 takes 11.10 seconds. Throughput: 22.97 images/sec (per worker), 91.87 images/sec (total) +On hostname r38n1.lisa.surfsara.nl - After training using 4.195556640625 GB of memory ``` -A sample output looks like this: -``` -7156/7156 [==============================] - 1435s 201ms/sample - loss: 5.9885 - sparse_categorical_accuracy: 0.0488 - val_loss: 5.8073 - val_sparse_categorical_accuracy: 0.1309 -7156/7156 [==============================] - 1141s 160ms/sample - loss: 3.0371 - sparse_categorical_accuracy: 0.3376 - val_loss: 2.0614 - val_sparse_categorical_accuracy: 0.5666 -7156/7156 [==============================] - 1237s 173ms/sample - loss: 0.5927 - sparse_categorical_accuracy: 0.8506 - val_loss: 0.0503 - val_sparse_categorical_accuracy: 0.9835 -7156/7156 [==============================] - 1123s 157ms/sample - loss: 0.0245 - sparse_categorical_accuracy: 0.9963 - val_loss: 0.0033 - val_sparse_categorical_accuracy: 0.9994 -7156/7156 [==============================] - 1236s 173ms/sample - loss: 0.0026 - sparse_categorical_accuracy: 0.9998 - val_loss: 9.3778e-07 - val_sparse_categorical_accuracy: 1.0000 -``` -The throughput can be read from the timing here, such as `173ms/sample`. Usually, this number is a bit larger in the first epoch, because `TensorFlow` needs to do some initialization in the first epoch. So we could pikc up the number from the 3rd or even 5th epoch when it is stablized. -Extract this number for different `np`, and see how this number changes a function of `np`. In a system with perfect (i.e., linear) scaling, this number should be constant. But in reality, this number should increase due to the communication overhead. Therefore, the growth of this number as a function of `np` tell us something about the scaling efficiency of the underlying system. +This output contains several useful information for use to derive the scaling efficiency of the HPC system: + +- `num_workers`: the number of (MPI) workers. This is essentially equal to the `-np` parameter in the `mpirun` command. Do not confuse this with the number of (CPU) cores used, because one worker may make use of multiple cores. If GPUs are used, one worker is typically associated with one GPU card. +- images/sec (per worker): this is the throughput per worker +- images/sec (total): this is the total throughput of the system + +Due to the initialization effect, the throughputs of the first two epochs are lower, so please read the throughput data from the third epoch onwards. +With the data of total throughput, we could calculate the scaling efficiency. In an ideal system, the total throughput scales linearly as a function of `num_workers`, and hence the scaling efficiency is 1. In practice, the scaling efficiency drops with more workers due to the communication overhead. The better connectivity the HPC system is, the better scaling efficiency. diff --git a/tensorflow/prerequisites-installation.md b/tensorflow/prerequisites-installation.md deleted file mode 100644 index 7d2032dc436c3a842a0b23c33cfd7c088c060b61..0000000000000000000000000000000000000000 --- a/tensorflow/prerequisites-installation.md +++ /dev/null @@ -1,11 +0,0 @@ -## Prerequisites Installation - -The prerequsities consists of a list of python packages as shown below. It is recommended to create a python virtual environment (either with `pyenv` or `conda`). The following packages can be installed using the `pip` package management tool: -``` -pip install tensorflow -pip install horovod -pip install scikit-learn -pip install scikit-image -pip install pandas -``` -Note: there is no guarantee of optimal performance when `tensorflow` is installed using `pip`. It is better if `tensorflow` is compiled from source, in which case the compiler will likely be able to take advantage of the advanced instruction sets supported by the processor (e.g., AVX512). An official build instruction can be found at https://www.tensorflow.org/install/source. Sometimes, an HPC center may have a tensorflow module optimized for their hardware, in which case the `pip install tensorflow` line can be replaced with a line like `module load `. diff --git a/tensorflow/testcase_medium/.gitignore b/tensorflow/testcase_medium/.gitignore deleted file mode 100644 index 6b9728353a8a39e5ed5cab7842e0e78de88271e6..0000000000000000000000000000000000000000 --- a/tensorflow/testcase_medium/.gitignore +++ /dev/null @@ -1,8 +0,0 @@ -batch_medium.slurm -efn_b4.h5 -env_bench -model_hvd_bw_512_B4_with_noise_n_p_4.h5 -output_bw_512.hdf5 -results-DG-medium/ -train_log.txt - diff --git a/tensorflow/testcase_medium/README.md b/tensorflow/testcase_medium/README.md deleted file mode 100644 index 8f0eae751b92553aed24238bb9d4664c81848f47..0000000000000000000000000000000000000000 --- a/tensorflow/testcase_medium/README.md +++ /dev/null @@ -1,8 +0,0 @@ -Medium test case presentation ------------------------------ - -This test case performs a training using 512X512 images, with 3 positions per image, as input. - -Reference time on Jean-zay with 4 nodes, 16 MPI proces, 16 GPUs, 3 positions and 100 epochs: - -* For 100epochs: ~67ms/sample and 32min30s as time to solution diff --git a/tensorflow/testcase_medium/prepare.sh b/tensorflow/testcase_medium/prepare.sh deleted file mode 100755 index 528d2720aabf45e3740d9c62fd9b15a977bcbad1..0000000000000000000000000000000000000000 --- a/tensorflow/testcase_medium/prepare.sh +++ /dev/null @@ -1,16 +0,0 @@ -#!/bin/bash - -if [ -z "$1" ] - then - echo "Please provide the targeted machine from:" - ls ../machines/ - echo "" - echo "Example: ./prepare.sh jeanzay-gpu" - exit 1 -fi -machine_dir="../machines/$1" - -cp $machine_dir/env_bench . -cp $machine_dir/batch_medium.slurm . - -ln -s ../DeepGalaxy-master/output_bw_512.hdf5 . diff --git a/tensorflow/testcase_medium/run.sh b/tensorflow/testcase_medium/run.sh deleted file mode 100755 index 2469b4581cf6f0ca145e98ee2d1dfd25d659f22b..0000000000000000000000000000000000000000 --- a/tensorflow/testcase_medium/run.sh +++ /dev/null @@ -1,2 +0,0 @@ -#!/bin/bash -sbatch batch_medium.slurm diff --git a/tensorflow/testcase_medium/validate.sh b/tensorflow/testcase_medium/validate.sh deleted file mode 100755 index cd3f906d4f2d9d5232aa712f2a02c56aa65ff54d..0000000000000000000000000000000000000000 --- a/tensorflow/testcase_medium/validate.sh +++ /dev/null @@ -1,12 +0,0 @@ -#!/bin/bash - -set -e - -RESULT_DIR=results-DG-medium -mkdir -p $RESULT_DIR - -cp dg.err dg.out $RESULT_DIR - - -grep "Epoch" -A1 dg.out > $RESULT_DIR/epochs.results -grep "Epoch 100/100" -A1 dg.out > $RESULT_DIR/last_epoch.results