diff --git a/gpaw/README.md b/gpaw/README.md index 94c09a0e2bcde16384a3e1ad96c0e99dc13a60fd..c024f4a265c29487b580ccc69fdf4732a197b61f 100644 --- a/gpaw/README.md +++ b/gpaw/README.md @@ -37,8 +37,8 @@ There is currently no active support for non-CUDA accelerator platforms. For the UEABS benchmark version 2.2, the following versions of GPAW were tested: * CPU-based: - * Version 1.5.3 as this one is the last of the 1.5 branch and since the GPU version - is derived from 1.5.2. + * Version 1.5.2 as this one is the last of the 1.5 branch and since the GPU version + is derived from this version. * Version 20.1.0, the most recent version during the development of the UEABS 2.2 benchmark suite. * GPU-based: There is no official release or version number. The UEABS 2.2 benchmark @@ -60,10 +60,18 @@ Input file: [benchmark/1_S_carbon-nanotube/input.py](benchmark/1_S_carbon-nanotu ### Case M: Copper filament A ground state calculation for a copper filament in vacuum. By default uses a -2x2x3 FCC lattice with 71 atoms (freely adjustable) and ScaLAPACK for -parallelisation. Expected to scale up to 100 nodes and/or 1000 MPI tasks. +3x4x4 FCC lattice with 71 atoms (freely adjustable through the variables `x`, +`y` and `z` in the input file) and ScaLAPACK for +parallellisation. Expected to scale up to 100 nodes and/or 1000 MPI tasks. -Input file: [benchmark/2_M_carbon-nanotube/input.py](benchmark/2_M_copper-filament/input.py) +Input file: [benchmark/2_M_copper-filament/input.py](benchmark/2_M_copper-filament/input.py) + +The benchmark was tested using 1000 and 1024 cores. For some core configurations, one may +get error messages similar to ``gpaw.grid_descriptor.BadGridError: Grid ... to small +for ... cores``. If one really wants to run the benchmark for those number of cores, +one needs to adapt the values of `x`, `y` and `z` in `input.py`. However, this +changes the benchmark so results cannot be compared easily with benchmark runs for +different values of these variables. ### Case L: Silicon cluster @@ -72,7 +80,7 @@ cluster has a radius of 15Å (freely adjustable) and consists of 702 atoms, and ScaLAPACK is used for parallelisation. Expected to scale up to 1000 nodes and/or 10000 MPI tasks. -Input file: [benchmark/3_L_carbon-nanotube/input.py](benchmark/3_L_silicon-cluster/input.py) +Input file: [benchmark/3_L_silicon-cluster/input.py](benchmark/3_L_silicon-cluster/input.py) ## Mechanics of building the benchmark @@ -82,18 +90,37 @@ last version with the old numbering. In 2019 the development team switched to a version numbering scheme based on year, month and patchlevel, e.g., 19.8.1 for the second version released in August 2019. -A further major change affecting both the build process and the mechanics of running -the benchmark happened in version 20.1.0. Versions up to and including 19.8.1 use a -wrapper executable `gpaw-python` that replaces the Python interpreter (it internally -links to the libpython library) and provides the MPI functionality. From version 20.1.0 -the standard Python interpreter is used and the MPI functionality is included in the `_gpaw.so` -shared library, though there is still an option in the build process (not tested for -the UEABS benchmarks) to generate that wrapper instead. +Another change is in the Python packages used to install GPAW. Versions up to +and including 19.8.1 use the `distutils` package while versions 20.1.0 and later +are based on `setuptools`. This does affect the installation process. + +GPAW for a while supports two different ways to run in parallel distributed memory mode: + * Using a wrapper executable `gpaw-python` that replaces the Python interpreter (it internally + links to the libpython library) and that provides the MPI functionality. + * Using the standard Python interpreter, including the MPI functionality in the + `_gpaw.so` shared library. +In the `distutils`-based versions, the wrapper script approach is the default behaviour, +while in the `setuptools`-based versions, the approach using the standard Python interpreter +is the preferred one in the manual. Even though the code in the `setuptools`-based +versions still includes the option to use the wrapper script approach, it does not +work in the tested version 20.1.0. + +### Available instructions + +The [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/) only contains the +[installation instructions](https://wiki.fysik.dtu.dk/gpaw/index.html) for the current version. +For the installation instructions with a list of dependencies for older versions, +download the code (see below) and look for the file `doc/install.rst` or go to the +[GPAW GitLab](https://gitlab.com/gpaw), select the tag for the desired version and +view the file `doc/install.rst`. + +The [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/) also provides some +[platform specific examples](https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html). ### List of dependencies -GPAW is Python code (3.5 or newer) but it also contains some C code for some performance-critical +GPAW is Python code but it also contains some C code for some performance-critical parts and to interface to a number of libraries on which it depends. Hence GPAW has the following requirements: @@ -101,16 +128,20 @@ Hence GPAW has the following requirements: * BLAS, LAPACK, BLACS and ScaLAPACK. ScaLAPACK is optional for GPAW, but mandatory for the UEABS benchmarks. It is used by the medium and large cases and optional for the small case. - * Python 3.5 or newer + * Python. GPAW 1.5.2 requires + Python 2.7 or 3.4-3.7, GPAW 19.8.1 requires 3.4-3.7, GPAW 20.1.0 Python 3.5-3.8 + and GPAW 20.10.0 Python 3.6-3.9. * Mandatory Python packages: - * [NumPY](https://pypi.org/project/numpy/) 1.9 or later (for GPAW 19.8.1/20.1.0) - * [SciPy](https://pypi.org/project/scipy/) 0.14 or later (for GPAW 19.8.1/20.1.0) + * [NumPY](https://pypi.org/project/numpy/) 1.9 or later (for GPAW 1.5.2/19.8.1/20.1.0/20.10.0) + * [SciPy](https://pypi.org/project/scipy/) 0.14 or later (for GPAW 1.5.2/19.8.1/20.1.0/20.10.0) * [FFTW](http://www.fftw.org) is highly recommended. As long as the optional libvdwxc component is not used, the MKL FFTW wrappers can also be used. Recent versions of - GPAW can even show good performance using just the NumPy-provided FFT routines provided + GPAW also show good performance using just the NumPy-provided FFT routines provided that NumPy has been built with a highly optimized FFT library. - * [LibXC](https://www.tddft.org/programs/libxc/) 3.X or 4.X. LibXC is a library - of exchange-correlation functions for density-functional theory + * [LibXC](https://www.tddft.org/programs/libxc/) 2.X or newer for GPAW 1.5.2, + 3.X or 4.X for GPAW 19.8.1, 20.1.0 and 20.10.0. LibXC is a library + of exchange-correlation functions for density-functional theory. None of the + versions currently mentions LibXC 5.X as officially supported. * [ASE, Atomic Simulation Environment](https://wiki.fysik.dtu.dk/ase/), a Python package from the same group that develops GPAW * Check the release notes of GPAW as the releases of ASE and GPAW should match. @@ -128,9 +159,17 @@ Hence GPAW has the following requirements: [LCAO mode](https://wiki.fysik.dtu.dk/gpaw/documentation/lcao/lcao.html) In addition, the GPU version needs: - * CUDA toolkit + * NVIDIA CUDA toolkit * [PyCUDA](https://pypi.org/project/pycuda/) +Installing GPAW also requires a number of standard build tools on the system, including + * [GNU autoconf](https://www.gnu.org/software/autoconf/) is needed to generate the + configure script for libxc + * [GNU Libtool](https://www.gnu.org/software/libtool/) is needed. If not found, + the configure process of libxc produces very misleading + error messages that do not immediately point to libtool missing. + * [GNU make](https://www.gnu.org/software/make/) + ### Download of GPAW @@ -155,21 +194,16 @@ git clone -b cuda https://gitlab.com/mlouhivu/gpaw.git ### Install -Official generic [installation instructions](https://wiki.fysik.dtu.dk/gpaw/install.html) -and -[platform specific examples](https://wiki.fysik.dtu.dk/gpaw/platforms/platforms.html) -are provided in the [GPAW wiki](https://wiki.fysik.dtu.dk/gpaw/). - Crucial for the configuration of GPAW is a proper `customize.py` (GPAW 19.8.1 and earlier) or `siteconfig.py` (GPAW 20.1.0 and later) file. The defaults used by GPAW may not offer optimal performance and the automatic detection of the libraries also fails on some systems. The UEABS repository contains additional instructions: - * [general instructions](installation.md) - Under development + * [general instructions](build/build-cpu.md) * [GPGPUs](build/build-cuda.md) - To check -Example [build scripts](build/examples/) are also available for some PRACE +Example [build scripts](build/examples/) are also available for some PRACE and non-PRACE systems. @@ -187,7 +221,9 @@ right from this repository. ### Running the benchmarks -#### Versions up to and including 19.8.1 of GPAW +#### Using the `gpaw-python` wrapper script + +This is the default approach for versions up to and including 19.8.1 of GPAW These versions of GPAW come with their own wrapper executable, `gpaw-python`, to start a MPI-based GPAW run. @@ -199,7 +235,9 @@ properly with the resource manager. E.g., on Slurm systems, use srun gpaw-python input.py ``` -#### GPAW 20.1.0 (and likely later) +#### Using the regular Python interpreter and parallel GPAW shared library + +This is the default method for GPAW 20.1.0 (and likely later). The wrapper executable `gpaw-python` is no longer available in the default parallel build of GPAW. There are now two different ways to start GPAW. @@ -229,8 +267,8 @@ would do. Example [job scripts](scripts/) (`scripts/job-*.sh`) are provided for different PRACE systems that may offer a helpful starting point. -TODO: Update the examples. +*TODO: Update the examples as testing on other systems goes on.* ## Verification of Results @@ -240,9 +278,9 @@ TODO. ### Case M: Copper filament -TODO. +TODO. Convergence problems. ### Case L: Silicon cluster -TODO. +TODO. Get the medium case to run before spending time on the large one. diff --git a/gpaw/build/build-CPU.md b/gpaw/build/build-CPU.md new file mode 100644 index 0000000000000000000000000000000000000000..6288e2f9f8bd4a23b8bba72058d0746a8a5d24f4 --- /dev/null +++ b/gpaw/build/build-CPU.md @@ -0,0 +1,317 @@ +# Detailed GPAW installation instructions on non-acclerated systems + +These instructions are in addition to the brief instructions in [README.md](../README.md). + +## Detailed dependency list + +### Libraries and Python interpreter + +GPAW needs (for the UEABS benchmarks) + * [Python](https://www.python.org/): GPAW 1.5.2 supports Python 2.7 and 3.4-3.7. + GPAW 19.8.1 needs Python 3.4-3.7 and GPAW 20.1.0 requires Python 3.5-3.8. + * [MPI library](https://www.mpi-forum.org/) + * [LibXC](https://www.tddft.org/programs/libxc/). GPAW 1.5.2 requires LibXC 1.5.2 + or later. GPAW 19.8.1 and 20.1.0 need LibXC 3.x or 4.x. + * (Optimized) [BLAS](http://www.netlib.org/blas/) and + [LAPACK](http://www.netlib.org/lapack/) libraries. + There are both commercial and free and open source versions of these libraries. + Using the [reference implementation of BLAS from netlib](http://www.netlib.org/blas/) + will give very poor performance. Most optimized LAPACK libraries actually only + optimize a few critical routines while the remaining routines are compiled from + the reference version. Most processor vendors for HPC machines and system vendors + offer optmized versions of these libraries. + * [ScaLAPACK](http://www.netlib.org/scalapack/) and the underlying communication + layer [BLACS](http://www.netlib.org/blacs/). + * [FFTW](http://www.fftw.org/) or compatible FFT library. + For the UEABS benchmarks, the double precision, non-MPI version is sufficient. + GPAW also works with the + [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) + FFT routines when using the FFTW wrappers provided with that product. + +For the GPU version, the following packages are needed in addition to the packages +above: + * CUDA toolkit + * [PyCUDA](https://pypi.org/project/pycuda/) + +Optional components of GPAW that are not used by the UEABS benchmarks: + * [libvdwxc](https://gitlab.com/libvdwxc/libvdwxc), a portable C library + of density functionals with van der Waals interactions for density functional theory. + This library does not work with the MKL FFTW wrappers as it needs the MPI version + of the FFTW libraries too. + * [ELPA](https://elpa.mpcdf.mpg.de/), + which should improve performance for large systems when GPAW is used in + [LCAO mode](https://wiki.fysik.dtu.dk/gpaw/documentation/lcao/lcao.html) + + +### Python packages + +GPAW needs + * [wheel](https://pypi.org/project/wheel/) is needed in most (if not all) ways of + installing the packages from source. + * [NumPy](https://pypi.org/project/numpy/) 1.9 or later (for GPAW 1.5.2/19.8.1/20.1.0/20.10.0) + * Installing NumPy from source will also require + [Cython](https://pypi.org/project/Cython/) + * GPAW 1.5.2 is not fully compatible with NumPy 1.19.x. Warnings about the use + of deprecated constructs will be shown. + * [SciPy](https://pypi.org/project/scipy/) 0.14 or later (for GPAW 1.5.2/19.8.1/20.1.0/20.10.0) + * [ASE, Atomic Simulation Environment](https://wiki.fysik.dtu.dk/ase/), a Python package + from the same group that develops GPAW. Required versions are 3.17.0 or later for + GPAW 1.5.2 and 3.18.0 or later for GPAW 19.8.1 or 20.1.0. + ASE has a couple of dependendencies + that are not needed for running the UEABS benchmarks. However, several Python + package install methods will trigger the installation of those packages, and + with them may require a chain of system libraries. + * ASE does need NumPy and SciPy, but these are needed anyway for GPAW. + * [matplotlib](https://pypi.org/project/matplotlib/), at least version 2.0.0. + This package is optional and not really needed to run the benchmarks. + Matplotlib pulls in a lot of other dependencies. When installing ASE with pip, + it will try to pull in matplotlib and its dependencies + * [pillow](https://pypi.org/project/Pillow/) needs several exgternal + libraries. During the development of the benchmarks, we needed at least + zlib, libjpeg-turbo (or compatible libjpeg library) and freetype. Even + though the pillow documentation claimed that libjpeg was optional, + it refused to install without. + * [kiwisolver](https://pypi.org/project/kiwisolver/): Contains C++-code + * [pyparsing](https://pypi.org/project/pyparsing/) + * [Cycler](https://pypi.org/project/Cycler/), which requires + * [six](https://pypi.org/project/six/) + * [python-dateutil](https://pypi.org/project/python-dateutil/), which also + requires + * [six](https://pypi.org/project/six/) + * [Flask](https://pypi.org/project/Flask/) is an optional dependency of ASE + that is not automatically pulled in by `pip` in versions of ASE tested during + the development of this version of the UEABS. It has a number of dependencies + too: + * [Jinja2](https://pypi.org/project/Jinja2/) + * [MarkupSafe](https://pypi.org/project/MarkupSafe/), contains some C + code + * [itsdangerous](https://pypi.org/project/itsdangerous/) + * [Werkzeug](https://pypi.org/project/Werkzeug/) + * [click]() + + +## Tested configurations + + * Python + * Libraries used during the installation of Python: + * ncurses 6.2 + * libreadline 8.0, as it makes life easy when using the command line + interface of Python (and in case of an EasyBuild Python, because EasyBuild + requires it) + * libffi 3.3 + * zlib 1.2.11 + * OpenSSL 1.1.1g, but only when EasyBuild was used and requires it. + * SQLite 3.33.0, as one of the tests in some versions of GPAW requires it to + succeed. + * Python will of course pick up several other libraries that it might find on + the system. The benchmark installation was tested on a system with very few + development packages of libraries installed in the system image. Tcl/Tk + and SQLite3 development packages in particular where not installed, so the + standard Python library packages sqlite3 and tkinter were not fully functional. + * Python packages + * wheel + * Cython + * NumPy + * SciPy + * ASE + * GPAW + +The table below give the combinations of major packages Python, NumPy, SciPy, ASE and +GPAW that were tested: + +| Python | NumPy | SciPy | ASE | GPAW | +|:-------|:-------|:------|:-------|:--------| +| 3.7.9 | 1.18.5 | 1.4.1 | 3.17.0 | 1.5.2 | +| 3.7.9 | 1.18.5 | 1.4.1 | 3.18.2 | 19.8.1 | +| 3.8.6 | 1.18.5 | 1.4.1 | 3.19.3 | 20.1.0 | + + +## Installing all prerequisites + +We do not include the optimized mathematical libraries in the instructions (BLAS, LAPACK, +FFT library, ...) as these libraries should be standard on any optimized HPC system. +Also, the instructions below will need to be adapted to the specific +libraries that are being used. + +Other prerequisites: + * libxc + * Python interpreter + * Python package NumPy + * Python package SciPy + * Python package ase + + +### Installing libxc + + * Installing libxc requires GNU automake and GNU buildtool besides GNU make and a + C compiler. The build process is the usual GNU configure - make - make install + cycle, but the `configure` script still needs to be generated with autoreconf. + * Download libxc: + * The latest version of libxc can be downloaded from + [the libxc download page](https://www.tddft.org/programs/libxc/download/). + However, that version may not be officially supported by GPAW. + * It is also possible to download all recent versions of libxc from + [the libxc GitLab](https://gitlab.com/libxc/libxc) + * Select the tag corresponding to the version you want to download in the + branch/tag selection box. + * Then use the download button and select the desired file type. + * Dowload URLs look like `https://gitlab.com/libxc/libxc/-/archive/4.3.4/libxc-4.3.4.tar.bz2`. + * Untar the file in the build directory. + + + + +### Installing Python from scratch + +The easiest way to get Python on your system is to download an existing distribution +(one will likely already be installed on your system). Python itself does have a lot +of dependencies though, definitely in its Standard Python Library. Many of the +standard packages are never needed when executing the benchmark cases. Isolating them +to compile a Python with minimal dependencies is beyond the scope though. We did +compile Python without the necessary libraries for the standard libraries sqlite3 +and tkinter (the latter needing Tcl/Tk). + +Even though GPAW contains a lot of Python code, the Python interpreter is not the main +performance-determining factor in the GPAW benchmark. Having a properly optimized installation +of NumPy, SciPy and GPAW itself proves much more important. + + +### Installing NumPy + + * As NumPy relies on optimized libraries for its performance, one should carefully + select which NumPy package to download, or install NumPy from sources. How crucial + this is, depends on the version of GPAW and the options selected when building + GPAW. + * Given that GPAW also uses optimized libraries, it is generally advised to install + NumPy from sources instead to ensure that the same libraries are used as will be + used for GPAW to prevent conflicts between libraries that might otherwise occur. + * In most cases, NumPy will need a `site.cfg` file to point to the optimized libraries. + See the examples for various systems and the file `site.cfg.example` included in + the NumPy sources. + + +### Installing SciPy + + * Just as NumPy, SciPy relies on optimized libraries for its performance. It should + be installed after NumPy as it does get the information about which libraries to + use from NumPy. Hence, when installing pre-built binaries, make sure they match + the NumPy binaries used. + * Just as is the case for NumPy, it may be better to install SciPy from sources. + [Instructions for installing SciPy from source can be found on the SciPy GitHub + site](https://github.com/scipy/scipy/blob/master/INSTALL.rst.txt). + + +### Installing ase + + * Just as for any user-installed Python package, make sure you have created a + directory to install Python packages to and have added it to the front of PYTHONPATH. + * ase is [available on PyPi](https://pypi.org/project/ase/). It is also possible + to [see a list of previous releases](https://pypi.org/project/ase/#history). + * The easiest way to install ase is using `pip` which will automatically download. + the requested version. + + +## Configuring and installing GPAW + +### GPAW 1.5.2 + + * GPAW 1.5.2 uses `distutils`. Customization of the installation process is possible + through the `customize.py` file. + * The FFT library: According to the documentation, the following strategy is used + * The compile process searches (in this order) for ``libmkl_rt.so``, + ``libmkl_intel_lp64.so`` and ``libfftw3.so`. First one found will be + loaded. + * If none is found, the built-in FFT from NumPy will be used. This does not need + to be a problem if NumPy provides a properly optimized FFT library. + * The choice can also be overwritten using the GPAW_FFTWSO environment variable. + * With certain compilers, the GPAW test suite produced crashes in `xc/xc.py`. The + patch for GPAW 1.5.2 included in the [pathces](patches) subdirectory solved these + problems on the systems tested. + + +### GPAW 19.8.1 + + * GPAW 19.8.1 uses `distutils`. Customization of the installation process is possible + through a `customize.py` file. + * The selection process of the FFT library has changed from version 1.5.2. It is + now possible to specify the FFT library in `customize.py` or to simply select to + use the NumPy FFT routines. + + + + +### GPAW 20.1.0 and 20.10.0 + + * GPAW 20.1.0 uses `setuptools`. Customization of the installation process is possible + through the `siteconfig.py` file. + * The selection process of the FFT library is the same as in version 19.8.1, except + that the settings are now in `siteconfrig.py` rather than `customize.py`. + + +### All versions + + * GPAW also needs a number of so-called "Atomic PAW Setup" files. The latest files + can be found on the [GPAW website, Atomic PAW Setups page](https://wiki.fysik.dtu.dk/gpaw/setups/setups.html). + For the testing we used []`gpaw-setups-0.9.20000.tar.gz`](https://wiki.fysik.dtu.dk/gpaw-files/gpaw-setups-0.9.20000.tar.gz) + for all versions of GPAW. The easiest way to install these files is to simpy untar + the file and set the environment variable GPAW_SETUP_PATH to point to that directory. + In the examples provided we use the `share/gpaw-setups` subdirectory of the install + directory for this purpose. + * Up to and including version 20.1.0, GPAW does comes with a test suite which can be + used after installation. + * Running the sequential tests: + + gpaw test + + Help is available through + + gpaw test -h + + * Running those tests, but using multiple cores (e.g., 4): + + gpaw test -j 4 + + We did experience that crashed that cause segmentation faults get unnoticed + in this setup. They are not mentioned as failed. + + * Running the parallel benchmarks on a SLURM cluster will depend on the version of GPAW. + + * Versions that build the parallel interpreter (19.8.1 and older): + + srun -n 4 gpaw-python -m gpaw test + + * Versions with the parallel so library using the regular Python interpreter (20.1.0 and above): + + srun -n 4 python -m gpaw test + + + * Depending on the Python installation, some tests may fail with error messages that point + to a package in the Standard Python Library that is not present. Some of these errors have no + influence on the benchmarks as that part of the code is not triggered by the benchmark. + * The full test suite is missing in GPAW 20.10.0. There is a brief sequential test + that can be run with + + gpaw test + + and a parallel one that can be run with + + gpaw -P 4 test + + * Multiple versions of GPAW likely contain a bug in `c/bmgs/fd.c` (around line 44 + in GPAW 1.5.2). The code enforces vectorization on OpenMP 4 compilers by using + `#pragma omp simd`. However, it turns out that the data is not always correctly + aligned, so if the reaction of the compiler to `#pragma omp simd` is to fully vectorize + and use load/store instructions for aligned data, crashes may occur. It did happen + during the benchmark development when compiling with the Intel C compiler. The + solution for that compiler is to add `-qno-openmp-simd` to the compiler flags. + + +## Problems observed during testing + + * On AMD Epyc systems, there seems to be a bug in the Intel MKL FFT libraries/FFTW + wrappers in the 2020 compilers. Downgrading to the MKL libraries of the 2018 + compilers or using the FFTW libraries solves the problem. + This has been observed not only in GPAW, but also in some other DFT packages. + * The GPAW test code in versions 1.5.2 till 20.1.0 detects that matplotlib is not installed + and will skip this test. We did however observe a failed test when Python could not find + the SQLite package as the Python standard library sqlite3 package is used. diff --git a/gpaw/build/examples/CalcUA-vaughan-rome/build_1.5.2_IntelPython3_icc.sh b/gpaw/build/examples/CalcUA-vaughan-rome/build_1.5.2_IntelPython3_icc.sh new file mode 100755 index 0000000000000000000000000000000000000000..03941ec03487fb68f242a8bd4409ba8380afa79e --- /dev/null +++ b/gpaw/build/examples/CalcUA-vaughan-rome/build_1.5.2_IntelPython3_icc.sh @@ -0,0 +1,351 @@ +#!/bin/bash +# +# Installation script for GPAW 1.5.2: +# * Using the existing IntelPython3 module on the system which has an optimized +# NumPy and SciPy included. +# * Using the matching version of ase, 3.17.0 +# * Compiling with the Intel compilers +# +# The FFT library is discovered at runtime. With the settings used in this script +# this should be MKL FFT, but it is possible to change this at runtime to either +# MKL, FFTW or the built-in NumPy FFT routines, see the installation instructions +# (link below). +# +# The original installation instructions for GPAW can be found at +# https://gitlab.com/gpaw/gpaw/-/blob/1.5.2/doc/install.rst +# + +packageID='1.5.2-IntelPython3-icc' + +install_root=$VSC_SCRATCH/UEABS +systemID=CalcUA-vaughan-rome + +download_dir=$install_root/Downloads +install_dir=$install_root/$systemID/Packages/GPAW-manual/$packageID +modules_dir=$install_root/$systemID/Modules/GPAW-manual +build_dir="/dev/shm/$USER/GPAW-manual/$packageID" +patch_dir=$VSC_DATA/Projects/PRACE/GPAW-experiments/UEABS/build/patches + +libxc_version='4.3.4' +ase_version='3.17.0' +GPAW_version='1.5.2' +GPAWsetups_version='0.9.20000' # Check version on https://wiki.fysik.dtu.dk/gpaw/setups/setups.html + +py_maj_min='3.7' + +################################################################################ +# +# Prepare the system +# + +# +# Load modules +# +module purge +module load calcua/2020a +module load intel/2020a +module load IntelPython3/2020a +module load buildtools/2020a + +# +# Create the directories and make sure they are clean if that matters +# +/usr/bin/mkdir -p $download_dir + +/usr/bin/mkdir -p $install_dir +/usr/bin/rm -rf $install_dir +/usr/bin/mkdir -p $install_dir + +/usr/bin/mkdir -p $modules_dir + +/usr/bin/mkdir -p $build_dir +/usr/bin/rm -rf $build_dir +/usr/bin/mkdir -p $build_dir + + + +################################################################################ +# +# Download components +# + +echo -e "\nDownloading files...\n" + +cd $download_dir + +# https://gitlab.com/libxc/libxc/-/archive/4.3.4/libxc-4.3.4.tar.bz2 +libxc_file="libxc-$libxc_version.tar.bz2" +libxc_url="https://gitlab.com/libxc/libxc/-/archive/$libxc_version" +[[ -f $libxc_file ]] || wget "$libxc_url/$libxc_file" + +# We do not download ase in this script. As it is pure python and doesn't need +# customization, we will install it using pip right away. +## https://files.pythonhosted.org/packages/d9/08/35969da23b641d3dfca46ba7559f651fcfdca81dbbc00b9058c934e75769/ase-3.17.0.tar.gz +#ase_file="ase-$ase_version.tar.gz" +#ase_url="https://files.pythonhosted.org/packages/d9/08/35969da23b641d3dfca46ba7559f651fcfdca81dbbc00b9058c934e75769" +#[[ -f $ase_file ]] || wget "$ase_url/$ase_file" + +# GPAW needs customization, so we need to download and unpack the sources. +# https://files.pythonhosted.org/packages/49/a1/cf54c399f5489cfdda1e8da02cae8bfb4b39d7cb7a895ce86608fcd0e1c9/gpaw-1.5.2.tar.gz +GPAW_file="gpaw-$GPAW_version.tar.gz" +#GPAW_url="https://files.pythonhosted.org/packages/49/a1/cf54c399f5489cfdda1e8da02cae8bfb4b39d7cb7a895ce86608fcd0e1c9" +GPAW_url="https://pypi.python.org/packages/source/g/gpaw" +[[ -f $GPAW_file ]] || wget "$GPAW_url/$GPAW_file" + +# Download GPAW-setup, a number of setup files for GPAW. +# https://wiki.fysik.dtu.dk/gpaw-files/gpaw-setups-0.9.20000.tar.gz +GPAWsetups_file="gpaw-setups-$GPAWsetups_version.tar.gz" +GPAWsetups_url="https://wiki.fysik.dtu.dk/gpaw-files" +[[ -f $GPAWsetups_file ]] || wget "$GPAWsetups_url/$GPAWsetups_file" + + +################################################################################ +# +# Install libxc +# + +echo -e "\nInstalling libxc...\n" + +cd $build_dir + +# Uncompress +tar -xf $download_dir/$libxc_file + +cd libxc-$libxc_version + +# Configure +autoreconf -i + +export CC=icc +export CFLAGS="-O2 -march=core-avx2 -mtune=core-avx2 -fPIC" +export CFLAGS="-O2 -march=core-avx2 -mtune=core-avx2 -ftz -fp-speculation=safe -fp-model source -fPIC" +#export CFLAGS="-O0 -march=core-avx2 -mtune=core-avx2 -ftz -fp-speculation=safe -fp-model source -fPIC" +./configure --prefix="$install_dir" \ + --disable-static --enable-shared --disable-fortran +# Build +make -j 16 + +# Install +make -j 16 install + +# Add bin, lib and include to the PATH variables +PATH=$install_dir/bin:$PATH +LIBRARY_PATH=$install_dir/lib:$LIBRARY_PATH +LD_LIBRARY_PATH=$install_dir/lib:$LD_LIBRARY_PATH +CPATH=$install_dir/include:$CPATH + + +################################################################################ +# +# Install ase +# + +echo -e "\nInstalling ase...\n" + +/usr/bin/mkdir -p "$install_dir/lib/python$py_maj_min/site-packages" +PYTHONPATH="$install_dir/lib/python$py_maj_min/site-packages" + +pip install --prefix=$install_dir --no-deps ase==$ase_version + + +################################################################################ +# +# Install GPAW-setups +# + +echo -e "\nInstalling gpaw-setups...\n" + +mkdir -p $install_dir/share/gpaw-setups +cd $install_dir/share/gpaw-setups +tar -xf $download_dir/$GPAWsetups_file --strip-components=1 + + +################################################################################ +# +# Install GPAW +# + +echo -e "\nInstalling GPAW...\n" + +cd $build_dir + +# Uncompress +tar -xf $download_dir/$GPAW_file + +# Apply patches +patch -p0 <$patch_dir/gpaw-1.5.2.patch + +cd gpaw-$GPAW_version + +# Make the customize.py script +mv customize.py customize.py.orig +cat >customize.py <$packageID.lua <site.cfg <customize.py <$packageID.lua <common.nspin==XC_UNPOLARIZED) ? 1 : 3; + for(i=0; icommon.nspin==XC_UNPOLARIZED) dauxdsigma[i] /= 2.; +- double dCdsigma[i]; +- dCdsigma[i]= dCdcsi*dcsidsigma[i]; +- ++ double dCdsigma = dCdcsi*dcsidsigma[i]; ++ + /* partial derivatives*/ +- de_PKZBdsigma[i] = de_PBEdsigma[i] * (1.0 + C * zsq) + dens * e_PBE * dCdsigma[i] * zsq +- - zsq * (dens * dCdsigma[i] * aux + (1.0 + C) * dauxdsigma[i]); ++ de_PKZBdsigma[i] = de_PBEdsigma[i] * (1.0 + C * zsq) + dens * e_PBE * dCdsigma * zsq ++ - zsq * (dens * dCdsigma * aux + (1.0 + C) * dauxdsigma[i]); + + } + } diff --git a/nemo/README.md b/nemo/README.md index e100891fa1416446bb3aad4b8b22ad8b51c86083..e135da5909298a07388035ac550d6d7744f6cfad 100644 --- a/nemo/README.md +++ b/nemo/README.md @@ -1,9 +1,10 @@ + # NEMO ## Summary Version -1.0 +1.1 ## Purpose of Benchmark @@ -35,90 +36,92 @@ The model is implemented in Fortran 90, with pre-processing (C-pre-processor). I ``` ./make_xios --arch local ``` + Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. -Note that XIOS requires `Netcdf4`. Please load the appropriate `HDF5` and `NetCDF4` modules. You might have to change the path in the configuration file. +Note that XIOS requires `Netcdf4`. Please load the appropriate `HDF5` and `NetCDF4` modules. If path to these models are not loaded, you might have to change the path in the configuration file. ### Building NEMO 1. Download the XIOS source code: ``` svn co https://forge.ipsl.jussieu.fr/nemo/svn/NEMO/releases/release-4.0 ``` -2. Copy and setup the appropriate architecture file in the arch folder. The following changes are recommended: +2. Copy and setup the appropriate architecture file in the arch folder. Files for the PRACE Tier-0 systems are available under [architecture_files](architecture_files) folder. The following changes are recommended for the GNU compilers: ``` a. add the `-lnetcdff` and `-lstdc++` flags to NetCDF flags b. using `mpif90` which is a MPI binding of `gfortran-4.9` c. add `-cpp` and `-ffree-line-length-none` to Fortran flags - d. swap out `gmake` with `make` - ``` - -3. Then build the executable with the following command - ``` - ./makenemo -m MY_CONFIG -r GYRE_XIOS -n MY_GYRE add_key "key_nosignedzero" ``` -4. Apply the patch as described here to measure step time : +3. Apply the patch as described here to measure step time : ``` https://software.intel.com/en-us/articles/building-and-running-nemo-on-xeon-processors ``` + You may also use [nemogcm.F90](nemogcm.F90) by replacing it with `src/OCE/nemogcm.F90` + +4. go to `cfgs` folder and add `GYRE_testing OCE TOP` line to `refs_cfg.txt` file. + Then: + ``` + mkdir GYRE_testing + rsync -arv GYRE_PISCES/* GYRE_testing/ + mv GYRE_testing/cpp_GYRE_PISCES.fcm GYRE_testing/cpp_GYRE_testing.fcm + In GYRE_testing/cpp_GYRE_testing.fcm file replace key_top with key_nosignedzero + ``` + +5. Build the executable with the following command + ``` + ../makenemo -m MY_CONFIG -r GYRE_testing + ``` ## Mechanics of Running Benchmark ### Prepare input files - cd MY_GYRE/EXP00 + cd GYRE_testing/EXP00 sed -i '/using_server/s/false/true/' iodef.xml - sed -i '/&nameos/a ln_useCT = .false.' namelist_cfg - sed -i '/&namctl/a nn_bench = 1' namelist_cfg + sed -i '/ln_bench/s/false/true/' namelist_cfg ### Run the experiment interactively - mpirun -n 4 ../BLD/bin/nemo.exe -n 2 $PATH_TO_XIOS/bin/xios_server.exe + mpirun -n 4 nemo : -n 2 $PATH_TO_XIOS/bin/xios_server.exe ### GYRE configuration with higher resolution Modify configuration (for example for the test case A): ``` - rm -f time.step solver.stat output.namelist.dyn ocean.output slurm-* GYRE_* mesh_mask_00* - jp_cfg=4 + rm -f time.step solver.stat output.namelist.dyn ocean.output slurm-* GYRE_* sed -i -r \ - -e 's/^( *nn_itend *=).*/\1 21600/' \ - -e 's/^( *nn_stock *=).*/\1 21600/' \ - -e 's/^( *nn_write *=).*/\1 1000/' \ - -e 's/^( *jp_cfg *=).*/\1 '"$jp_cfg"'/' \ - -e 's/^( *jpidta *=).*/\1 '"$(( 30 * jp_cfg +2))"'/' \ - -e 's/^( *jpjdta *=).*/\1 '"$(( 20 * jp_cfg +2))"'/' \ - -e 's/^( *jpiglo *=).*/\1 '"$(( 30 * jp_cfg +2))"'/' \ - -e 's/^( *jpjglo *=).*/\1 '"$(( 20 * jp_cfg +2))"'/' \ + -e 's/^( *nn_itend *=).*/\1 101/' \ + -e 's/^( *nn_write *=).*/\1 4320/' \ + -e 's/^( *nn_GYRE *=).*/\1 48/' \ + -e 's/^( *rn_rdt *=).*/\1 1200/' \ namelist_cfg ``` ## Verification of Results -The GYRE configuration is set through the `namelist_cfg` file. The horizontal resolution is determined by setting `jp_cfg` as follows: +The GYRE configuration is set through the `namelist_cfg` file. The horizontal resolution is determined by setting `nn_GYRE` as follows: ``` - Jpiglo = 30 × jp_cfg + 2 - Jpjglo = 20 × jp_cfg + 2 + Jpiglo = 30 × nn_GYRE + 2 + Jpjglo = 20 × nn_GYRE + 2 ``` -In this configuration, we use a default value of 30 ocean levels, depicted by `jpk=31`. The GYRE configuration is an ideal case for benchmark tests as it is very simple to increase the resolution and perform both weak and strong scalability experiment using the same input files. We use two configurations as follows: +In this configuration, we use a default value of 30 ocean levels, depicted by `jpkglo=31`. The GYRE configuration is an ideal case for benchmark tests as it is very simple to increase the resolution and perform both weak and strong scalability experiment using the same input files. We use two configurations as follows: Test Case A: ``` - jp_cfg = 128 suitable up to 1000 cores - Number of Days: 20 - Number of Time steps: 1440 + nn_GYRE = 48 suitable up to 1000 cores + Number of Time steps: 101 Time step size: 20 mins Number of seconds per time step: 1200 ``` Test Case B: ``` - jp_cfg = 256 suitable up to 20,000 cores. - Number of Days (real): 80 - Number of time step: 4320 + nn_GYRE = 192 suitable up to 20,000 cores. + Number of time step: 101 Time step size(real): 20 mins Number of seconds per time step: 1200 ``` We performed scalability test on 512 cores and 1024 cores for test case A. We performed scalability test for 4096 cores, 8192 cores and 16384 cores for test case B. - Both these test cases can give us quite good understanding of node performance and interconnect behavior. -We switch off the generation of mesh files by setting the `flag nn_mesh = 0` in the `namelist_ref` file. Also `using_server = false` is defined in `io_server` file. + We report the performance in step time which is the total computational time averaged over the number of time steps for different test cases. This helps us to compare systems in a standard manner across all combinations of system architectures. diff --git a/nemo/architecture_files/NEMO/arch-JUWELS.fcm b/nemo/architecture_files/NEMO/arch-JUWELS.fcm new file mode 100644 index 0000000000000000000000000000000000000000..e75e0eab459f55fd688e896472a31321a7e37481 --- /dev/null +++ b/nemo/architecture_files/NEMO/arch-JUWELS.fcm @@ -0,0 +1,71 @@ +# generic ifort compiler options for JUWELS +# +# NCDF_HOME root directory containing lib and include subdirectories for netcdf4 +# HDF5_HOME root directory containing lib and include subdirectories for HDF5 +# XIOS_HOME root directory containing lib for XIOS +# OASIS_HOME root directory containing lib for OASIS +# +# NCDF_INC netcdf4 include file +# NCDF_LIB netcdf4 library +# XIOS_INC xios include file (taken into accound only if key_iomput is activated) +# XIOS_LIB xios library (taken into accound only if key_iomput is activated) +# OASIS_INC oasis include file (taken into accound only if key_oasis3 is activated) +# OASIS_LIB oasis library (taken into accound only if key_oasis3 is activated) +# +# FC Fortran compiler command +# FCFLAGS Fortran compiler flags +# FFLAGS Fortran 77 compiler flags +# LD linker +# LDFLAGS linker flags, e.g. -L if you have libraries +# FPPFLAGS pre-processing flags +# AR assembler +# ARFLAGS assembler flags +# MK make +# USER_INC complete list of include files +# USER_LIB complete list of libraries to pass to the linker +# CC C compiler used to compile conv for AGRIF +# CFLAGS compiler flags used with CC +# +# Note that: +# - unix variables "$..." are accpeted and will be evaluated before calling fcm. +# - fcm variables are starting with a % (and not a $) +# + +%NCDF_HOME /gpfs/software/juwels/stages/2019a/software/netCDF/4.6.3-ipsmpi-2019a.1/ +%NCDF_HOME2 /gpfs/software/juwels/stages/2019a/software/netCDF-Fortran/4.4.5-ipsmpi-2019a.1/ +%HDF5_HOME /gpfs/software/juwels/stages/2019a/software/HDF5/1.10.5-ipsmpi-2019a.1/ +%CURL /gpfs/software/juwels/stages/2019a/software/cURL/7.64.1-GCCcore-8.3.0/lib/ + +%XIOS_HOME /p/project/prpb86/nemo2/xios-2.5/ +%OASIS_HOME /not/defined + +%HDF5_LIB -L%HDF5_HOME/lib -L%CURL -lhdf5_hl -lhdf5 +%GCCLIB + +%NCDF_INC -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include +%NCDF_LIB -L%NCDF_HOME/lib %HDF5_LIB -L%CURL -L%NCDF_HOME2/lib -lnetcdff -lnetcdf -L%GCCLIB -lstdc++ -lz -lcurl -lgpfs + +%XIOS_INC -I%XIOS_HOME/inc +%XIOS_LIB -L%XIOS_HOME/lib -lxios -L%GCCLIB -lstdc++ + +%OASIS_INC -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1 +%OASIS_LIB -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip + +%CPP icc -E +%FC mpifort +%FCFLAGS -O3 -r8 -funroll-all-loops -traceback + +%FFLAGS %FCFLAGS +%LD mpifort +%LDFLAGS -lstdc++ -lifcore -O3 -traceback +%FPPFLAGS -P -C -traditional + +%AR ar +%ARFLAGS -r + +%MK make +%USER_INC %XIOS_INC %OASIS_INC %NCDF_INC +%USER_LIB %XIOS_LIB %OASIS_LIB %NCDF_LIB + +%CC cc +%CFLAGS -O0 diff --git a/nemo/architecture_files/NEMO/arch-M100.fcm b/nemo/architecture_files/NEMO/arch-M100.fcm new file mode 100644 index 0000000000000000000000000000000000000000..ccbc93a9e27a0cd08097f7b1abdd09ae2c65d636 --- /dev/null +++ b/nemo/architecture_files/NEMO/arch-M100.fcm @@ -0,0 +1,69 @@ +# generic gfortran compiler options for linux M100 +# +# NCDF_HOME root directory containing lib and include subdirectories for netcdf4 +# HDF5_HOME root directory containing lib and include subdirectories for HDF5 +# XIOS_HOME root directory containing lib for XIOS +# OASIS_HOME root directory containing lib for OASIS +# +# NCDF_INC netcdf4 include file +# NCDF_LIB netcdf4 library +# XIOS_INC xios include file (taken into accound only if key_iomput is activated) +# XIOS_LIB xios library (taken into accound only if key_iomput is activated) +# OASIS_INC oasis include file (taken into accound only if key_oasis3 is activated) +# OASIS_LIB oasis library (taken into accound only if key_oasis3 is activated) +# +# FC Fortran compiler command +# FCFLAGS Fortran compiler flags +# FFLAGS Fortran 77 compiler flags +# LD linker +# LDFLAGS linker flags, e.g. -L if you have libraries +# FPPFLAGS pre-processing flags +# AR assembler +# ARFLAGS assembler flags +# MK make +# USER_INC complete list of include files +# USER_LIB complete list of libraries to pass to the linker +# CC C compiler used to compile conv for AGRIF +# CFLAGS compiler flags used with CC +# +# Note that: +# - unix variables "$..." are accpeted and will be evaluated before calling fcm. +# - fcm variables are starting with a % (and not a $) +# + +%NCDF_HOME /cineca/prod/opt/libraries/netcdf/4.7.3/gnu--8.4.0/ +%NCDF_HOME2 /cineca/prod/opt/libraries/netcdff/4.5.2/gnu--8.4.0/ +%HDF5_HOME /cineca/prod/opt/libraries/hdf5/1.12.0/gnu--8.4.0/ +%XIOS_HOME /m100_work/Ppp4x_5387/xios-2.5/ +%OASIS_HOME /not/defined + + +%HDF5_LIB -L%HDF5_HOME/lib -lhdf5_hl -lhdf5 +%GCCLIB /cineca/prod/opt/compilers/gnu/8.4.0/none/lib64/ + +%NCDF_INC -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include +%NCDF_LIB -L%NCDF_HOME/lib %HDF5_LIB -L%NCDF_HOME2/lib -lnetcdff -lnetcdf -L%GCCLIB -lstdc++ -lz -lcurl -lgpfs +%XIOS_INC -I%XIOS_HOME/inc +%XIOS_LIB -L%XIOS_HOME/lib -lxios -L%GCCLIB -lstdc++ + +%OASIS_INC -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1 +%OASIS_LIB -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip + +%CPP cpp -Dkey_nosignedzero +%FC mpif90 +%FCFLAGS -fdefault-real-8 -fno-second-underscore -O3 -funroll-all-loops -fcray-pointer -cpp -ffree-line-length-none -Dgfortran +%FFLAGS %FCFLAGS +%LD %FC +%LDFLAGS +%FPPFLAGS -P -C -traditional -x f77-cpp-input + +%AR ar +%ARFLAGS rs +%MK make +%USER_INC %XIOS_INC %OASIS_INC %NCDF_INC +%USER_LIB %XIOS_LIB %OASIS_LIB %NCDF_LIB + + + +%CC cc +%CFLAGS -O0 diff --git a/nemo/architecture_files/NEMO/arch-SuperMUC.fcm b/nemo/architecture_files/NEMO/arch-SuperMUC.fcm new file mode 100644 index 0000000000000000000000000000000000000000..9b7330db1ecb43e7465923d97b64bcef23392baf --- /dev/null +++ b/nemo/architecture_files/NEMO/arch-SuperMUC.fcm @@ -0,0 +1,79 @@ +# generic ifort compiler options for SuperMUC +# +# NCDF_HOME root directory containing lib and include subdirectories for netcdf4 +# HDF5_HOME root directory containing lib and include subdirectories for HDF5 +# XIOS_HOME root directory containing lib for XIOS +# OASIS_HOME root directory containing lib for OASIS +# +# NCDF_INC netcdf4 include file +# NCDF_LIB netcdf4 library +# XIOS_INC xios include file (taken into accound only if key_iomput is activated) +# XIOS_LIB xios library (taken into accound only if key_iomput is activated) +# OASIS_INC oasis include file (taken into accound only if key_oasis3 is activated) +# OASIS_LIB oasis library (taken into accound only if key_oasis3 is activated) +# +# FC Fortran compiler command +# FCFLAGS Fortran compiler flags +# FFLAGS Fortran 77 compiler flags +# LD linker +# LDFLAGS linker flags, e.g. -L if you have libraries +# FPPFLAGS pre-processing flags +# AR assembler +# ARFLAGS assembler flags +# MK make +# USER_INC complete list of include files +# USER_LIB complete list of libraries to pass to the linker +# CC C compiler used to compile conv for AGRIF +# CFLAGS compiler flags used with CC +# +# Note that: +# - unix variables "$..." are accpeted and will be evaluated before calling fcm. +# - fcm variables are starting with a % (and not a $) +# + +%NCDF_HOME /dss/dsshome1/lrz/sys/spack/release/19.2/opt/x86_avx512/netcdf/4.6.1-intel-rdopmwr/ +%NCDF_HOME2 /dss/dsshome1/lrz/sys/spack/release/19.2/opt/x86_avx512/netcdf-fortran/4.4.4-intel-mq54rwz/ +%HDF5_HOME /dss/dsshome1/lrz/sys/spack/release/19.2/opt/x86_avx512/hdf5/1.10.2-intel-726msh6/ +%CURL /dss/dsshome1/lrz/sys/spack/release/19.2/opt/x86_avx512/curl/7.60.0-gcc-u7vewcb/lib/ +%XIOS_HOME /hppfs/work/pn68so/di67wat/NEMO/xios-2.5/ +%OASIS_HOME /not/defined + + +%HDF5_LIB -L%HDF5_HOME/lib -L%CURL -lhdf5_hl -lhdf5 +%GCCLIB + + +%NCDF_INC -I%NCDF_HOME/include -I%NCDF_HOME2/include -I%HDF5_HOME/include +%NCDF_LIB -L%NCDF_HOME/lib %HDF5_LIB -L%CURL -L%NCDF_HOME2/lib -lnetcdff -lnetcdf -L%GCCLIB -lstdc++ -lz -lcurl -lgpfs +%XIOS_INC -I%XIOS_HOME/inc +%XIOS_LIB -L%XIOS_HOME/lib -lxios -L%GCCLIB -lstdc++ + +%OASIS_INC -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1 +%OASIS_LIB -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip + + +%CPP icc -E +%FC mpiifort +%FCFLAGS -O3 -r8 -funroll-all-loops -traceback + + + +%FFLAGS %FCFLAGS +%LD mpiifort +%LDFLAGS -lstdc++ -lifcore -O3 -traceback + + +%FPPFLAGS -P -C -traditional + +%AR ar +%ARFLAGS -r + +%MK make +%USER_INC %XIOS_INC %OASIS_INC %NCDF_INC +%USER_LIB %XIOS_LIB %OASIS_LIB %NCDF_LIB + + + +%CC cc +%CFLAGS -O0 + diff --git a/nemo/architecture_files/XIOS/arch-JUWELS.env b/nemo/architecture_files/XIOS/arch-JUWELS.env new file mode 100644 index 0000000000000000000000000000000000000000..1d32bbef4691ecd8f4c8e3388f7ee12cab6ad919 --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-JUWELS.env @@ -0,0 +1,9 @@ +module load GCC/8.3.0 +module load PGI/19.10-GCC-8.3.0 +module load Intel/2019.5.281-GCC-8.3.0 +module load ParaStationMPI/5.4 +module load HDF5/1.10.5 +module load netCDF/4.6.3 +module load netCDF-Fortran/4.4.5 +module load cURL +module load Perl diff --git a/nemo/architecture_files/XIOS/arch-JUWELS.fcm b/nemo/architecture_files/XIOS/arch-JUWELS.fcm new file mode 100644 index 0000000000000000000000000000000000000000..dcfe49af559b43964dd5722f16b1508a2506275c --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-JUWELS.fcm @@ -0,0 +1,24 @@ +################################################################################ +################### Projet XIOS ################### +################################################################################ + +%CCOMPILER mpicc +%FCOMPILER mpif90 +%LINKER mpif90 -nofor-main + +%BASE_CFLAGS -ansi -w +%PROD_CFLAGS -O3 -DBOOST_DISABLE_ASSERTS +%DEV_CFLAGS -g -O2 +%DEBUG_CFLAGS -g + +%BASE_FFLAGS -D__NONE__ -ffree-line-length-none +%PROD_FFLAGS -O3 +%DEV_FFLAGS -g -O2 +%DEBUG_FFLAGS -g + +%BASE_INC -D__NONE__ +%BASE_LD -lstdc++ + +%CPP cpp +%FPP cpp -P +%MAKE make diff --git a/nemo/architecture_files/XIOS/arch-JUWELS.path b/nemo/architecture_files/XIOS/arch-JUWELS.path new file mode 100644 index 0000000000000000000000000000000000000000..9f741839eaba6bcdd4577ce2ace16913b73986de --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-JUWELS.path @@ -0,0 +1,20 @@ +NETCDF_INCDIR="-I$NETCDF_INC_DIR -I$NETCDFF_INC_DIR" +NETCDF_LIBDIR="-Wl,'--allow-multiple-definition' -L$NETCDF_LIB_DIR -L$NETCDFF_LIB_DIR" +NETCDF_LIB="-lnetcdff -lnetcdf" + +MPI_INCDIR="" +MPI_LIBDIR="" +MPI_LIB="" + +HDF5_INCDIR="-I $HDF5_INC_DIR" +HDF5_LIBDIR="-L $HDF5_LIB_DIR" +HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl" + +BOOST_INCDIR="-I $BOOST_INC_DIR" +BOOST_LIBDIR="-L $BOOST_LIB_DIR" +BOOST_LIB="" + +OASIS_INCDIR="-I$PWD/../../oasis3-mct/BLD/build/lib/psmile.MPI1" +OASIS_LIBDIR="-L$PWD/../../oasis3-mct/BLD/lib" +OASIS_LIB="-lpsmile.MPI1 -lscrip -lmct -lmpeu" + diff --git a/nemo/architecture_files/XIOS/arch-M100.env b/nemo/architecture_files/XIOS/arch-M100.env new file mode 100644 index 0000000000000000000000000000000000000000..9cc14ed42bad480a3710af4ba7c557c31c48e611 --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-M100.env @@ -0,0 +1,7 @@ +module load gnu +module load szip +module load zlib +module load spectrum_mpi +module load hdf5 +module load netcdf +module load netcdff diff --git a/nemo/architecture_files/XIOS/arch-M100.fcm b/nemo/architecture_files/XIOS/arch-M100.fcm new file mode 100644 index 0000000000000000000000000000000000000000..b4d556236aab1633ec9f24caf54d84cdaa568ae8 --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-M100.fcm @@ -0,0 +1,24 @@ +################################################################################ +################### Projet XIOS ################### +################################################################################ + +%CCOMPILER mpicc +%FCOMPILER mpif90 +%LINKER mpif90 + +%BASE_CFLAGS -ansi -w +%PROD_CFLAGS -O3 -DBOOST_DISABLE_ASSERTS +%DEV_CFLAGS -g -O2 -traceback +%DEBUG_CFLAGS -DBZ_DEBUG -g -traceback -fno-inline + +%BASE_FFLAGS -D__NONE__ -ffree-line-length-none +%PROD_FFLAGS -O3 +%DEV_FFLAGS -g -O2 -traceback +%DEBUG_FFLAGS -g -traceback + +%BASE_INC -D __NONE__ +%BASE_LD -lstdc++ + +%CPP cpp +%FPP cpp -P +%MAKE make diff --git a/nemo/architecture_files/XIOS/arch-M100.path b/nemo/architecture_files/XIOS/arch-M100.path new file mode 100644 index 0000000000000000000000000000000000000000..8cab12120804b2cd1d6ca572d7f742c775e09e15 --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-M100.path @@ -0,0 +1,19 @@ +NETCDF_INCDIR="-I$NETCDF_INC_DIR -I$NETCDFF_INC_DIR" +NETCDF_LIBDIR="-Wl,'--allow-multiple-definition' -L$NETCDF_LIB_DIR -L$NETCDFF_LIB_DIR" +NETCDF_LIB="-lnetcdff -lnetcdf" + +MPI_INCDIR="" +MPI_LIBDIR="" +MPI_LIB="" + +HDF5_INCDIR="-I $HDF5_INC_DIR" +HDF5_LIBDIR="-L $HDF5_LIB_DIR" +HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl" + +BOOST_INCDIR="-I $BOOST_INC_DIR" +BOOST_LIBDIR="-L $BOOST_LIB_DIR" +BOOST_LIB="" + +OASIS_INCDIR="-I$PWD/../../oasis3-mct/BLD/build/lib/psmile.MPI1" +OASIS_LIBDIR="-L$PWD/../../oasis3-mct/BLD/lib" +OASIS_LIB="-lpsmile.MPI1 -lscrip -lmct -lmpeu" diff --git a/nemo/architecture_files/XIOS/arch-SuperMUC.env b/nemo/architecture_files/XIOS/arch-SuperMUC.env new file mode 100644 index 0000000000000000000000000000000000000000..d6ee46303436e187dcc847876abcdcecf9a7176e --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-SuperMUC.env @@ -0,0 +1,4 @@ +module load slurm_setup +module load hdf5 +module load netcdf +module load netcdf-fortran diff --git a/nemo/architecture_files/XIOS/arch-SuperMUC.fcm b/nemo/architecture_files/XIOS/arch-SuperMUC.fcm new file mode 100644 index 0000000000000000000000000000000000000000..955b88f34a50af58b11259d3753716665f9101ad --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-SuperMUC.fcm @@ -0,0 +1,24 @@ +################################################################################ +################### Projet XIOS ################### +################################################################################ + +%CCOMPILER mpicc +%FCOMPILER mpif90 +%LINKER mpif90 -nofor-main + +%BASE_CFLAGS -ansi -w +%PROD_CFLAGS -O3 -DBOOST_DISABLE_ASSERTS +%DEV_CFLAGS -g -O2 +%DEBUG_CFLAGS -g + +%BASE_FFLAGS -D__NONE__ -ffree-line-length-none +%PROD_FFLAGS -O3 +%DEV_FFLAGS -g -O2 +%DEBUG_FFLAGS -g + +%BASE_INC -D__NONE__ +%BASE_LD -lstdc++ + +%CPP cpp +%FPP cpp -P +%MAKE make diff --git a/nemo/architecture_files/XIOS/arch-SuperMUC.path b/nemo/architecture_files/XIOS/arch-SuperMUC.path new file mode 100644 index 0000000000000000000000000000000000000000..2adca8e12d88ab735c90239e148d36aa1f4a1c19 --- /dev/null +++ b/nemo/architecture_files/XIOS/arch-SuperMUC.path @@ -0,0 +1,19 @@ +NETCDF_INCDIR="-I$NETCDF_INC_DIR -I$NETCDFF_INC_DIR" +NETCDF_LIBDIR="-Wl,'--allow-multiple-definition' -L $NETCDF_LIB_DIR -L $NETCDFF_LIB_DIR" +NETCDF_LIB="-lnetcdff -lnetcdf" + +MPI_INCDIR="" +MPI_LIBDIR="" +MPI_LIB="" + +HDF5_INCDIR="-I $HDF5_INC_DIR" +HDF5_LIBDIR="-L $HDF5_LIB_DIR" +HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl" + +BOOST_INCDIR="-I $BOOST_INC_DIR" +BOOST_LIBDIR="-L $BOOST_LIB_DIR" +BOOST_LIB="" + +OASIS_INCDIR="-I$PWD/../../oasis3-mct/BLD/build/lib/psmile.MPI1" +OASIS_LIBDIR="-L$PWD/../../oasis3-mct/BLD/lib" +OASIS_LIB="-lpsmile.MPI1 -lscrip -lmct -lmpeu" diff --git a/nemo/fixfcm.bash b/nemo/fixfcm.bash deleted file mode 100644 index ea2a8df5cabd98491f8e2f2bc924fcdf2a605391..0000000000000000000000000000000000000000 --- a/nemo/fixfcm.bash +++ /dev/null @@ -1,17 +0,0 @@ -#!/usr/bin/env bash - -# A tool to modify XML files used by FCM - -# This is just a regexp search and replace, not a proper XML -# parser. Use at own risk. - -fixfcm() { - local name value prog="" - for arg in "$@"; do - name="${arg%%=*}" - value=$(printf %q "${arg#*=}") - value="${value//\//\/}" - prog="s/(^%${name} )(.*)/\\1 ${value}/"$'\n'"$prog" - done - sed -r -e "$prog" -} diff --git a/nemo/nemogcm.F90 b/nemo/nemogcm.F90 new file mode 100644 index 0000000000000000000000000000000000000000..a67f1e37e6c22a82f8a9540c3dcf04053b29f974 --- /dev/null +++ b/nemo/nemogcm.F90 @@ -0,0 +1,734 @@ +MODULE nemogcm + !!====================================================================== + !! *** MODULE nemogcm *** + !! Ocean system : NEMO GCM (ocean dynamics, on-line tracers, biochemistry and sea-ice) + !!====================================================================== + !! History : OPA ! 1990-10 (C. Levy, G. Madec) Original code + !! 7.0 ! 1991-11 (M. Imbard, C. Levy, G. Madec) + !! 7.1 ! 1993-03 (M. Imbard, C. Levy, G. Madec, O. Marti, M. Guyon, A. Lazar, + !! P. Delecluse, C. Perigaud, G. Caniaux, B. Colot, C. Maes) release 7.1 + !! - ! 1992-06 (L.Terray) coupling implementation + !! - ! 1993-11 (M.A. Filiberti) IGLOO sea-ice + !! 8.0 ! 1996-03 (M. Imbard, C. Levy, G. Madec, O. Marti, M. Guyon, A. Lazar, + !! P. Delecluse, L.Terray, M.A. Filiberti, J. Vialar, A.M. Treguier, M. Levy) release 8.0 + !! 8.1 ! 1997-06 (M. Imbard, G. Madec) + !! 8.2 ! 1999-11 (M. Imbard, H. Goosse) sea-ice model + !! ! 1999-12 (V. Thierry, A-M. Treguier, M. Imbard, M-A. Foujols) OPEN-MP + !! ! 2000-07 (J-M Molines, M. Imbard) Open Boundary Conditions (CLIPPER) + !! NEMO 1.0 ! 2002-08 (G. Madec) F90: Free form and modules + !! - ! 2004-06 (R. Redler, NEC CCRLE, Germany) add OASIS[3/4] coupled interfaces + !! - ! 2004-08 (C. Talandier) New trends organization + !! - ! 2005-06 (C. Ethe) Add the 1D configuration possibility + !! - ! 2005-11 (V. Garnier) Surface pressure gradient organization + !! - ! 2006-03 (L. Debreu, C. Mazauric) Agrif implementation + !! - ! 2006-04 (G. Madec, R. Benshila) Step reorganization + !! - ! 2007-07 (J. Chanut, A. Sellar) Unstructured open boundaries (BDY) + !! 3.2 ! 2009-08 (S. Masson) open/write in the listing file in mpp + !! 3.3 ! 2010-05 (K. Mogensen, A. Weaver, M. Martin, D. Lea) Assimilation interface + !! - ! 2010-10 (C. Ethe, G. Madec) reorganisation of initialisation phase + !! 3.3.1! 2011-01 (A. R. Porter, STFC Daresbury) dynamical allocation + !! - ! 2011-11 (C. Harris) decomposition changes for running with CICE + !! 3.6 ! 2012-05 (C. Calone, J. Simeon, G. Madec, C. Ethe) Add grid coarsening + !! - ! 2014-12 (G. Madec) remove KPP scheme and cross-land advection (cla) + !! 4.0 ! 2016-10 (G. Madec, S. Flavoni) domain configuration / user defined interface + !!---------------------------------------------------------------------- + + !!---------------------------------------------------------------------- + !! nemo_gcm : solve ocean dynamics, tracer, biogeochemistry and/or sea-ice + !! nemo_init : initialization of the NEMO system + !! nemo_ctl : initialisation of the contol print + !! nemo_closefile: close remaining open files + !! nemo_alloc : dynamical allocation + !!---------------------------------------------------------------------- + USE step_oce ! module used in the ocean time stepping module (step.F90) + USE phycst ! physical constant (par_cst routine) + USE domain ! domain initialization (dom_init & dom_cfg routines) + USE closea ! treatment of closed seas (for ln_closea) + USE usrdef_nam ! user defined configuration + USE tideini ! tidal components initialization (tide_ini routine) + USE bdy_oce, ONLY : ln_bdy + USE bdyini ! open boundary cond. setting (bdy_init routine) + USE istate ! initial state setting (istate_init routine) + USE ldfdyn ! lateral viscosity setting (ldfdyn_init routine) + USE ldftra ! lateral diffusivity setting (ldftra_init routine) + USE trdini ! dyn/tra trends initialization (trd_init routine) + USE asminc ! assimilation increments + USE asmbkg ! writing out state trajectory + USE diaptr ! poleward transports (dia_ptr_init routine) + USE diadct ! sections transports (dia_dct_init routine) + USE diaobs ! Observation diagnostics (dia_obs_init routine) + USE diacfl ! CFL diagnostics (dia_cfl_init routine) + USE step ! NEMO time-stepping (stp routine) + USE icbini ! handle bergs, initialisation + USE icbstp ! handle bergs, calving, themodynamics and transport + USE cpl_oasis3 ! OASIS3 coupling + USE c1d ! 1D configuration + USE step_c1d ! Time stepping loop for the 1D configuration + USE dyndmp ! Momentum damping + USE stopar ! Stochastic param.: ??? + USE stopts ! Stochastic param.: ??? + USE diurnal_bulk ! diurnal bulk SST + USE step_diu ! diurnal bulk SST timestepping (called from here if run offline) + USE crsini ! initialise grid coarsening utility + USE diatmb ! Top,middle,bottom output + USE dia25h ! 25h mean output + USE sbc_oce , ONLY : lk_oasis + USE wet_dry ! Wetting and drying setting (wad_init routine) +#if defined key_top + USE trcini ! passive tracer initialisation +#endif +#if defined key_nemocice_decomp + USE ice_domain_size, only: nx_global, ny_global +#endif + ! + USE lib_mpp ! distributed memory computing + USE mppini ! shared/distributed memory setting (mpp_init routine) + USE lbcnfd , ONLY : isendto, nsndto, nfsloop, nfeloop ! Setup of north fold exchanges + USE lib_fortran ! Fortran utilities (allows no signed zero when 'key_nosignedzero' defined) +#if defined key_iomput + USE xios ! xIOserver +#endif +#if defined key_agrif + USE agrif_all_update ! Master Agrif update +#endif + + IMPLICIT NONE + PRIVATE + + PUBLIC nemo_gcm ! called by model.F90 + PUBLIC nemo_init ! needed by AGRIF + PUBLIC nemo_alloc ! needed by TAM + + CHARACTER(lc) :: cform_aaa="( /, 'AAAAAAAA', / ) " ! flag for output listing + +#if defined key_mpp_mpi + INCLUDE 'mpif.h' +#endif + + !!---------------------------------------------------------------------- + !! NEMO/OCE 4.0 , NEMO Consortium (2018) + !! $Id: nemogcm.F90 11098 2019-06-11 13:17:21Z agn $ + !! Software governed by the CeCILL license (see ./LICENSE) + !!---------------------------------------------------------------------- +CONTAINS + + SUBROUTINE nemo_gcm + !!---------------------------------------------------------------------- + !! *** ROUTINE nemo_gcm *** + !! + !! ** Purpose : NEMO solves the primitive equations on an orthogonal + !! curvilinear mesh on the sphere. + !! + !! ** Method : - model general initialization + !! - launch the time-stepping (stp routine) + !! - finalize the run by closing files and communications + !! + !! References : Madec, Delecluse, Imbard, and Levy, 1997: internal report, IPSL. + !! Madec, 2008, internal report, IPSL. + !!---------------------------------------------------------------------- + INTEGER :: istp ! time step index + DOUBLE PRECISION :: mpi_wtime, sstart, send , tot_time , ssteptime , smstime + DOUBLE PRECISION :: gtot_time , gssteptime , gelapsed_time , step1time ,gstep1time,galltime + INTEGER :: rank, ierror, tag, status(MPI_STATUS_SIZE) + !!---------------------------------------------------------------------- + ! +#if defined key_agrif + CALL Agrif_Init_Grids() ! AGRIF: set the meshes +#endif + ! !-----------------------! + CALL nemo_init !== Initialisations ==! + ! !-----------------------! +#if defined key_agrif + CALL Agrif_Declare_Var_dom ! AGRIF: set the meshes for DOM + CALL Agrif_Declare_Var ! " " " " " DYN/TRA +# if defined key_top + CALL Agrif_Declare_Var_top ! " " " " " TOP +# endif +# if defined key_si3 + CALL Agrif_Declare_Var_ice ! " " " " " Sea ice +# endif +#endif + ! check that all process are still there... If some process have an error, + ! they will never enter in step and other processes will wait until the end of the cpu time! + CALL mpp_max( 'nemogcm', nstop ) + + IF(lwp) WRITE(numout,cform_aaa) ! Flag AAAAAAA + + ! !-----------------------! + ! !== time stepping ==! + ! !-----------------------! + istp = nit000 + ! +#if defined key_c1d + DO WHILE ( istp <= nitend .AND. nstop == 0 ) !== C1D time-stepping ==! + CALL stp_c1d( istp ) + istp = istp + 1 + END DO +#else + ! +# if defined key_agrif + ! !== AGRIF time-stepping ==! + CALL Agrif_Regrid() + ! + ! Recursive update from highest nested level to lowest: + CALL Agrif_step_child_adj(Agrif_Update_All) + ! + DO WHILE( istp <= nitend .AND. nstop == 0 ) + CALL stp ! AGRIF: time stepping + istp = istp + 1 + END DO + ! + IF( .NOT. Agrif_Root() ) THEN + CALL Agrif_ParentGrid_To_ChildGrid() + IF( ln_diaobs ) CALL dia_obs_wri + IF( ln_timing ) CALL timing_finalize + CALL Agrif_ChildGrid_To_ParentGrid() + ENDIF + ! +# else + ! + IF( .NOT.ln_diurnal_only ) THEN !== Standard time-stepping ==! + ! + CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) + DO WHILE( istp <= nitend .AND. nstop == 0 ) +#if defined key_mpp_mpi + ncom_stp = istp + IF ( istp == ( nit000 + 1 ) ) elapsed_time = MPI_Wtime() + IF ( istp == nitend ) elapsed_time = MPI_Wtime() - elapsed_time +#endif + sstart = MPI_Wtime() + CALL stp ( istp ) + send = MPI_Wtime() + + ssteptime = send-sstart + !==IF (rank == 0 ) print *, "Ozan Step ", istp, " - " , ssteptime , "s."==! + IF (istp == 1 ) THEN + step1time = ssteptime + ENDIF + IF (istp == 2 ) THEN + smstime = ssteptime + tot_time = ssteptime + ENDIF + IF (istp > 2 ) THEN + tot_time = tot_time+ssteptime + IF ( ssteptime>smstime ) smstime = ssteptime + ENDIF + + istp = istp + 1 + END DO + + !CALL MPI_REDUCE(tot_time,gtot_time, 1, mpi_double_precision, MPI_MAX, 0, mpi_comm_world,ierror) + !CALL MPI_REDUCE(smstime,gssteptime, 1, mpi_double_precision, MPI_MAX, 0, mpi_comm_world,ierror) + !CALL MPI_REDUCE(elapsed_time,gelapsed_time, 1, mpi_double_precision, MPI_MAX, 0, mpi_comm_world,ierror) + !CALL MPI_REDUCE(step1time,gstep1time, 1, mpi_double_precision, MPI_MAX, 0, mpi_comm_world,ierror) + !CALL MPI_REDUCE(step1time+tot_time,galltime, 1, mpi_double_precision, MPI_MAX, 0, mpi_comm_world,ierror) + !IF (rank == 0 ) print *, "BENCH DONE ",istp," " ,gstep1time," ", gssteptime , " " , gtot_time ," ",gelapsed_time, " ",galltime," s." + + print *, "BENCH DONE ",istp," " ,step1time," ", smstime , " " , tot_time ," ",elapsed_time, " ",step1time+tot_time," s." + ! + ELSE !== diurnal SST time-steeping only ==! + ! + DO WHILE( istp <= nitend .AND. nstop == 0 ) + CALL stp_diurnal( istp ) ! time step only the diurnal SST + istp = istp + 1 + END DO + ! + ENDIF + ! +# endif + ! +#endif + ! + IF( ln_diaobs ) CALL dia_obs_wri + ! + IF( ln_icebergs ) CALL icb_end( nitend ) + + ! !------------------------! + ! !== finalize the run ==! + ! !------------------------! + IF(lwp) WRITE(numout,cform_aaa) ! Flag AAAAAAA + ! + IF( nstop /= 0 .AND. lwp ) THEN ! error print + WRITE(numout,cform_err) + WRITE(numout,*) ' ==>>> nemo_gcm: a total of ', nstop, ' errors have been found' + WRITE(numout,*) + ENDIF + ! + IF( ln_timing ) CALL timing_finalize + ! + CALL nemo_closefile + ! +#if defined key_iomput + CALL xios_finalize ! end mpp communications with xios + IF( lk_oasis ) CALL cpl_finalize ! end coupling and mpp communications with OASIS +#else + IF ( lk_oasis ) THEN ; CALL cpl_finalize ! end coupling and mpp communications with OASIS + ELSEIF( lk_mpp ) THEN ; CALL mppstop( ldfinal = .TRUE. ) ! end mpp communications + ENDIF +#endif + ! + IF(lwm) THEN + IF( nstop == 0 ) THEN ; STOP 0 + ELSE ; STOP 999 + ENDIF + ENDIF + ! + END SUBROUTINE nemo_gcm + + + SUBROUTINE nemo_init + !!---------------------------------------------------------------------- + !! *** ROUTINE nemo_init *** + !! + !! ** Purpose : initialization of the NEMO GCM + !!---------------------------------------------------------------------- + INTEGER :: ji ! dummy loop indices + INTEGER :: ios, ilocal_comm ! local integers + CHARACTER(len=120), DIMENSION(60) :: cltxt, cltxt2, clnam + !! + NAMELIST/namctl/ ln_ctl , sn_cfctl, nn_print, nn_ictls, nn_ictle, & + & nn_isplt , nn_jsplt, nn_jctls, nn_jctle, & + & ln_timing, ln_diacfl + NAMELIST/namcfg/ ln_read_cfg, cn_domcfg, ln_closea, ln_write_cfg, cn_domcfg_out, ln_use_jattr + !!---------------------------------------------------------------------- + ! + cltxt = '' + cltxt2 = '' + clnam = '' + cxios_context = 'nemo' + ! + ! ! Open reference namelist and configuration namelist files + CALL ctl_opn( numnam_ref, 'namelist_ref', 'OLD', 'FORMATTED', 'SEQUENTIAL', -1, 6, .FALSE. ) + CALL ctl_opn( numnam_cfg, 'namelist_cfg', 'OLD', 'FORMATTED', 'SEQUENTIAL', -1, 6, .FALSE. ) + ! + REWIND( numnam_ref ) ! Namelist namctl in reference namelist + READ ( numnam_ref, namctl, IOSTAT = ios, ERR = 901 ) +901 IF( ios /= 0 ) CALL ctl_nam ( ios , 'namctl in reference namelist', .TRUE. ) + REWIND( numnam_cfg ) ! Namelist namctl in confguration namelist + READ ( numnam_cfg, namctl, IOSTAT = ios, ERR = 902 ) +902 IF( ios > 0 ) CALL ctl_nam ( ios , 'namctl in configuration namelist', .TRUE. ) + ! + REWIND( numnam_ref ) ! Namelist namcfg in reference namelist + READ ( numnam_ref, namcfg, IOSTAT = ios, ERR = 903 ) +903 IF( ios /= 0 ) CALL ctl_nam ( ios , 'namcfg in reference namelist', .TRUE. ) + REWIND( numnam_cfg ) ! Namelist namcfg in confguration namelist + READ ( numnam_cfg, namcfg, IOSTAT = ios, ERR = 904 ) +904 IF( ios > 0 ) CALL ctl_nam ( ios , 'namcfg in configuration namelist', .TRUE. ) + + ! !--------------------------! + ! ! Set global domain size ! (control print return in cltxt2) + ! !--------------------------! + IF( ln_read_cfg ) THEN ! Read sizes in domain configuration file + CALL domain_cfg ( cltxt2, cn_cfg, nn_cfg, jpiglo, jpjglo, jpkglo, jperio ) + ! + ELSE ! user-defined namelist + CALL usr_def_nam( cltxt2, clnam, cn_cfg, nn_cfg, jpiglo, jpjglo, jpkglo, jperio ) + ENDIF + ! + ! + ! !--------------------------------------------! + ! ! set communicator & select the local node ! + ! ! NB: mynode also opens output.namelist.dyn ! + ! ! on unit number numond on first proc ! + ! !--------------------------------------------! +#if defined key_iomput + IF( Agrif_Root() ) THEN + IF( lk_oasis ) THEN + CALL cpl_init( "oceanx", ilocal_comm ) ! nemo local communicator given by oasis + CALL xios_initialize( "not used" ,local_comm= ilocal_comm ) ! send nemo communicator to xios + ELSE + CALL xios_initialize( "for_xios_mpi_id",return_comm=ilocal_comm ) ! nemo local communicator given by xios + ENDIF + ENDIF + ! Nodes selection (control print return in cltxt) + narea = mynode( cltxt, 'output.namelist.dyn', numnam_ref, numnam_cfg, numond , nstop, ilocal_comm ) +#else + IF( lk_oasis ) THEN + IF( Agrif_Root() ) THEN + CALL cpl_init( "oceanx", ilocal_comm ) ! nemo local communicator given by oasis + ENDIF + ! Nodes selection (control print return in cltxt) + narea = mynode( cltxt, 'output.namelist.dyn', numnam_ref, numnam_cfg, numond , nstop, ilocal_comm ) + ELSE + ilocal_comm = 0 ! Nodes selection (control print return in cltxt) + narea = mynode( cltxt, 'output.namelist.dyn', numnam_ref, numnam_cfg, numond , nstop ) + ENDIF +#endif + + narea = narea + 1 ! mynode return the rank of proc (0 --> jpnij -1 ) + + IF( sn_cfctl%l_config ) THEN + ! Activate finer control of report outputs + ! optionally switch off output from selected areas (note this only + ! applies to output which does not involve global communications) + IF( ( narea < sn_cfctl%procmin .OR. narea > sn_cfctl%procmax ) .OR. & + & ( MOD( narea - sn_cfctl%procmin, sn_cfctl%procincr ) /= 0 ) ) & + & CALL nemo_set_cfctl( sn_cfctl, .FALSE., .FALSE. ) + ELSE + ! Use ln_ctl to turn on or off all options. + CALL nemo_set_cfctl( sn_cfctl, ln_ctl, .TRUE. ) + ENDIF + + lwm = (narea == 1) ! control of output namelists + lwp = (narea == 1) .OR. ln_ctl ! control of all listing output print + + IF(lwm) THEN ! write merged namelists from earlier to output namelist + ! ! now that the file has been opened in call to mynode. + ! ! NB: nammpp has already been written in mynode (if lk_mpp_mpi) + WRITE( numond, namctl ) + WRITE( numond, namcfg ) + IF( .NOT.ln_read_cfg ) THEN + DO ji = 1, SIZE(clnam) + IF( TRIM(clnam(ji)) /= '' ) WRITE(numond, * ) clnam(ji) ! namusr_def print + END DO + ENDIF + ENDIF + + IF(lwp) THEN ! open listing units + ! + CALL ctl_opn( numout, 'ocean.output', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, 6, .FALSE., narea ) + ! + WRITE(numout,*) + WRITE(numout,*) ' CNRS - NERC - Met OFFICE - MERCATOR-ocean - INGV - CMCC' + WRITE(numout,*) ' NEMO team' + WRITE(numout,*) ' Ocean General Circulation Model' + WRITE(numout,*) ' NEMO version 4.0 (2019) ' + WRITE(numout,*) + WRITE(numout,*) " ._ ._ ._ ._ ._ " + WRITE(numout,*) " _.-._)`\_.-._)`\_.-._)`\_.-._)`\_.-._)`\_ " + WRITE(numout,*) + WRITE(numout,*) " o _, _, " + WRITE(numout,*) " o .' ( .-' / " + WRITE(numout,*) " o _/..._'. .' / " + WRITE(numout,*) " ( o .-'` ` '-./ _.' " + WRITE(numout,*) " ) ( o) ;= <_ ( " + WRITE(numout,*) " ( '-.,\\__ __.-;`\ '. ) " + WRITE(numout,*) " ) ) \) |`\ \) '. \ ( ( " + WRITE(numout,*) " ( ( \_/ '-._\ ) ) " + WRITE(numout,*) " ) ) jgs ` ( ( " + WRITE(numout,*) " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ " + WRITE(numout,*) + + DO ji = 1, SIZE(cltxt) + IF( TRIM(cltxt (ji)) /= '' ) WRITE(numout,*) TRIM(cltxt(ji)) ! control print of mynode + END DO + WRITE(numout,*) + WRITE(numout,*) + DO ji = 1, SIZE(cltxt2) + IF( TRIM(cltxt2(ji)) /= '' ) WRITE(numout,*) TRIM(cltxt2(ji)) ! control print of domain size + END DO + ! + WRITE(numout,cform_aaa) ! Flag AAAAAAA + ! + ENDIF + ! open /dev/null file to be able to supress output write easily + CALL ctl_opn( numnul, '/dev/null', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, 6, .FALSE. ) + ! + ! ! Domain decomposition + CALL mpp_init ! MPP + + ! Now we know the dimensions of the grid and numout has been set: we can allocate arrays + CALL nemo_alloc() + + ! !-------------------------------! + ! ! NEMO general initialization ! + ! !-------------------------------! + + CALL nemo_ctl ! Control prints + ! + ! ! General initialization + IF( ln_timing ) CALL timing_init ! timing + IF( ln_timing ) CALL timing_start( 'nemo_init') + ! + CALL phy_cst ! Physical constants + CALL eos_init ! Equation of state + IF( lk_c1d ) CALL c1d_init ! 1D column configuration + CALL wad_init ! Wetting and drying options + CALL dom_init("OPA") ! Domain + IF( ln_crs ) CALL crs_init ! coarsened grid: domain initialization + IF( ln_ctl ) CALL prt_ctl_init ! Print control + + CALL diurnal_sst_bulk_init ! diurnal sst + IF( ln_diurnal ) CALL diurnal_sst_coolskin_init ! cool skin + ! + IF( ln_diurnal_only ) THEN ! diurnal only: a subset of the initialisation routines + CALL istate_init ! ocean initial state (Dynamics and tracers) + CALL sbc_init ! Forcings : surface module + CALL tra_qsr_init ! penetrative solar radiation qsr + IF( ln_diaobs ) THEN ! Observation & model comparison + CALL dia_obs_init ! Initialize observational data + CALL dia_obs( nit000 - 1 ) ! Observation operator for restart + ENDIF + IF( lk_asminc ) CALL asm_inc_init ! Assimilation increments + ! + RETURN ! end of initialization + ENDIF + + CALL istate_init ! ocean initial state (Dynamics and tracers) + + ! ! external forcing + CALL tide_init ! tidal harmonics + CALL sbc_init ! surface boundary conditions (including sea-ice) + CALL bdy_init ! Open boundaries initialisation + + ! ! Ocean physics + CALL zdf_phy_init ! Vertical physics + + ! ! Lateral physics + CALL ldf_tra_init ! Lateral ocean tracer physics + CALL ldf_eiv_init ! eddy induced velocity param. + CALL ldf_dyn_init ! Lateral ocean momentum physics + + ! ! Active tracers + IF( ln_traqsr ) CALL tra_qsr_init ! penetrative solar radiation qsr + CALL tra_bbc_init ! bottom heat flux + CALL tra_bbl_init ! advective (and/or diffusive) bottom boundary layer scheme + CALL tra_dmp_init ! internal tracer damping + CALL tra_adv_init ! horizontal & vertical advection + CALL tra_ldf_init ! lateral mixing + + ! ! Dynamics + IF( lk_c1d ) CALL dyn_dmp_init ! internal momentum damping + CALL dyn_adv_init ! advection (vector or flux form) + CALL dyn_vor_init ! vorticity term including Coriolis + CALL dyn_ldf_init ! lateral mixing + CALL dyn_hpg_init ! horizontal gradient of Hydrostatic pressure + CALL dyn_spg_init ! surface pressure gradient + +#if defined key_top + ! ! Passive tracers + CALL trc_init +#endif + IF( l_ldfslp ) CALL ldf_slp_init ! slope of lateral mixing + + ! ! Icebergs + CALL icb_init( rdt, nit000) ! initialise icebergs instance + + ! ! Misc. options + CALL sto_par_init ! Stochastic parametrization + IF( ln_sto_eos ) CALL sto_pts_init ! RRandom T/S fluctuations + + ! ! Diagnostics + IF( lk_floats ) CALL flo_init ! drifting Floats + IF( ln_diacfl ) CALL dia_cfl_init ! Initialise CFL diagnostics + CALL dia_ptr_init ! Poleward TRansports initialization + IF( lk_diadct ) CALL dia_dct_init ! Sections tranports + CALL dia_hsb_init ! heat content, salt content and volume budgets + CALL trd_init ! Mixed-layer/Vorticity/Integral constraints trends + CALL dia_obs_init ! Initialize observational data + CALL dia_tmb_init ! TMB outputs + CALL dia_25h_init ! 25h mean outputs + IF( ln_diaobs ) CALL dia_obs( nit000-1 ) ! Observation operator for restart + + ! ! Assimilation increments + IF( lk_asminc ) CALL asm_inc_init ! Initialize assimilation increments + ! + IF(lwp) WRITE(numout,cform_aaa) ! Flag AAAAAAA + ! + IF( ln_timing ) CALL timing_stop( 'nemo_init') + ! + END SUBROUTINE nemo_init + + + SUBROUTINE nemo_ctl + !!---------------------------------------------------------------------- + !! *** ROUTINE nemo_ctl *** + !! + !! ** Purpose : control print setting + !! + !! ** Method : - print namctl information and check some consistencies + !!---------------------------------------------------------------------- + ! + IF(lwp) THEN ! control print + WRITE(numout,*) + WRITE(numout,*) 'nemo_ctl: Control prints' + WRITE(numout,*) '~~~~~~~~' + WRITE(numout,*) ' Namelist namctl' + WRITE(numout,*) ' run control (for debugging) ln_ctl = ', ln_ctl + WRITE(numout,*) ' finer control over o/p sn_cfctl%l_config = ', sn_cfctl%l_config + WRITE(numout,*) ' sn_cfctl%l_runstat = ', sn_cfctl%l_runstat + WRITE(numout,*) ' sn_cfctl%l_trcstat = ', sn_cfctl%l_trcstat + WRITE(numout,*) ' sn_cfctl%l_oceout = ', sn_cfctl%l_oceout + WRITE(numout,*) ' sn_cfctl%l_layout = ', sn_cfctl%l_layout + WRITE(numout,*) ' sn_cfctl%l_mppout = ', sn_cfctl%l_mppout + WRITE(numout,*) ' sn_cfctl%l_mpptop = ', sn_cfctl%l_mpptop + WRITE(numout,*) ' sn_cfctl%procmin = ', sn_cfctl%procmin + WRITE(numout,*) ' sn_cfctl%procmax = ', sn_cfctl%procmax + WRITE(numout,*) ' sn_cfctl%procincr = ', sn_cfctl%procincr + WRITE(numout,*) ' sn_cfctl%ptimincr = ', sn_cfctl%ptimincr + WRITE(numout,*) ' level of print nn_print = ', nn_print + WRITE(numout,*) ' Start i indice for SUM control nn_ictls = ', nn_ictls + WRITE(numout,*) ' End i indice for SUM control nn_ictle = ', nn_ictle + WRITE(numout,*) ' Start j indice for SUM control nn_jctls = ', nn_jctls + WRITE(numout,*) ' End j indice for SUM control nn_jctle = ', nn_jctle + WRITE(numout,*) ' number of proc. following i nn_isplt = ', nn_isplt + WRITE(numout,*) ' number of proc. following j nn_jsplt = ', nn_jsplt + WRITE(numout,*) ' timing by routine ln_timing = ', ln_timing + WRITE(numout,*) ' CFL diagnostics ln_diacfl = ', ln_diacfl + ENDIF + ! + nprint = nn_print ! convert DOCTOR namelist names into OLD names + nictls = nn_ictls + nictle = nn_ictle + njctls = nn_jctls + njctle = nn_jctle + isplt = nn_isplt + jsplt = nn_jsplt + + IF(lwp) THEN ! control print + WRITE(numout,*) + WRITE(numout,*) ' Namelist namcfg' + WRITE(numout,*) ' read domain configuration file ln_read_cfg = ', ln_read_cfg + WRITE(numout,*) ' filename to be read cn_domcfg = ', TRIM(cn_domcfg) + WRITE(numout,*) ' keep closed seas in the domain (if exist) ln_closea = ', ln_closea + WRITE(numout,*) ' create a configuration definition file ln_write_cfg = ', ln_write_cfg + WRITE(numout,*) ' filename to be written cn_domcfg_out = ', TRIM(cn_domcfg_out) + WRITE(numout,*) ' use file attribute if exists as i/p j-start ln_use_jattr = ', ln_use_jattr + ENDIF + IF( .NOT.ln_read_cfg ) ln_closea = .false. ! dealing possible only with a domcfg file + ! + ! ! Parameter control + ! + IF( ln_ctl ) THEN ! sub-domain area indices for the control prints + IF( lk_mpp .AND. jpnij > 1 ) THEN + isplt = jpni ; jsplt = jpnj ; ijsplt = jpni*jpnj ! the domain is forced to the real split domain + ELSE + IF( isplt == 1 .AND. jsplt == 1 ) THEN + CALL ctl_warn( ' - isplt & jsplt are equal to 1', & + & ' - the print control will be done over the whole domain' ) + ENDIF + ijsplt = isplt * jsplt ! total number of processors ijsplt + ENDIF + IF(lwp) WRITE(numout,*)' - The total number of processors over which the' + IF(lwp) WRITE(numout,*)' print control will be done is ijsplt : ', ijsplt + ! + ! ! indices used for the SUM control + IF( nictls+nictle+njctls+njctle == 0 ) THEN ! print control done over the default area + lsp_area = .FALSE. + ELSE ! print control done over a specific area + lsp_area = .TRUE. + IF( nictls < 1 .OR. nictls > jpiglo ) THEN + CALL ctl_warn( ' - nictls must be 1<=nictls>=jpiglo, it is forced to 1' ) + nictls = 1 + ENDIF + IF( nictle < 1 .OR. nictle > jpiglo ) THEN + CALL ctl_warn( ' - nictle must be 1<=nictle>=jpiglo, it is forced to jpiglo' ) + nictle = jpiglo + ENDIF + IF( njctls < 1 .OR. njctls > jpjglo ) THEN + CALL ctl_warn( ' - njctls must be 1<=njctls>=jpjglo, it is forced to 1' ) + njctls = 1 + ENDIF + IF( njctle < 1 .OR. njctle > jpjglo ) THEN + CALL ctl_warn( ' - njctle must be 1<=njctle>=jpjglo, it is forced to jpjglo' ) + njctle = jpjglo + ENDIF + ENDIF + ENDIF + ! + IF( 1._wp /= SIGN(1._wp,-0._wp) ) CALL ctl_stop( 'nemo_ctl: The intrinsec SIGN function follows f2003 standard.', & + & 'Compile with key_nosignedzero enabled:', & + & '--> add -Dkey_nosignedzero to the definition of %CPP in your arch file' ) + ! +#if defined key_agrif + IF( ln_timing ) CALL ctl_stop( 'AGRIF not implemented with ln_timing = true') +#endif + ! + END SUBROUTINE nemo_ctl + + + SUBROUTINE nemo_closefile + !!---------------------------------------------------------------------- + !! *** ROUTINE nemo_closefile *** + !! + !! ** Purpose : Close the files + !!---------------------------------------------------------------------- + ! + IF( lk_mpp ) CALL mppsync + ! + CALL iom_close ! close all input/output files managed by iom_* + ! + IF( numstp /= -1 ) CLOSE( numstp ) ! time-step file + IF( numrun /= -1 ) CLOSE( numrun ) ! run statistics file + IF( numnam_ref /= -1 ) CLOSE( numnam_ref ) ! oce reference namelist + IF( numnam_cfg /= -1 ) CLOSE( numnam_cfg ) ! oce configuration namelist + IF( lwm.AND.numond /= -1 ) CLOSE( numond ) ! oce output namelist + IF( numnam_ice_ref /= -1 ) CLOSE( numnam_ice_ref ) ! ice reference namelist + IF( numnam_ice_cfg /= -1 ) CLOSE( numnam_ice_cfg ) ! ice configuration namelist + IF( lwm.AND.numoni /= -1 ) CLOSE( numoni ) ! ice output namelist + IF( numevo_ice /= -1 ) CLOSE( numevo_ice ) ! ice variables (temp. evolution) + IF( numout /= 6 ) CLOSE( numout ) ! standard model output file + IF( numdct_vol /= -1 ) CLOSE( numdct_vol ) ! volume transports + IF( numdct_heat /= -1 ) CLOSE( numdct_heat ) ! heat transports + IF( numdct_salt /= -1 ) CLOSE( numdct_salt ) ! salt transports + ! + numout = 6 ! redefine numout in case it is used after this point... + ! + END SUBROUTINE nemo_closefile + + + SUBROUTINE nemo_alloc + !!---------------------------------------------------------------------- + !! *** ROUTINE nemo_alloc *** + !! + !! ** Purpose : Allocate all the dynamic arrays of the OPA modules + !! + !! ** Method : + !!---------------------------------------------------------------------- + USE diawri , ONLY : dia_wri_alloc + USE dom_oce , ONLY : dom_oce_alloc + USE trc_oce , ONLY : trc_oce_alloc + USE bdy_oce , ONLY : bdy_oce_alloc +#if defined key_diadct + USE diadct , ONLY : diadct_alloc +#endif + ! + INTEGER :: ierr + !!---------------------------------------------------------------------- + ! + ierr = oce_alloc () ! ocean + ierr = ierr + dia_wri_alloc() + ierr = ierr + dom_oce_alloc() ! ocean domain + ierr = ierr + zdf_oce_alloc() ! ocean vertical physics + ierr = ierr + trc_oce_alloc() ! shared TRC / TRA arrays + ierr = ierr + bdy_oce_alloc() ! bdy masks (incl. initialization) + ! +#if defined key_diadct + ierr = ierr + diadct_alloc () ! +#endif + ! + CALL mpp_sum( 'nemogcm', ierr ) + IF( ierr /= 0 ) CALL ctl_stop( 'STOP', 'nemo_alloc: unable to allocate standard ocean arrays' ) + ! + END SUBROUTINE nemo_alloc + + SUBROUTINE nemo_set_cfctl(sn_cfctl, setto, for_all ) + !!---------------------------------------------------------------------- + !! *** ROUTINE nemo_set_cfctl *** + !! + !! ** Purpose : Set elements of the output control structure to setto. + !! for_all should be .false. unless all areas are to be + !! treated identically. + !! + !! ** Method : Note this routine can be used to switch on/off some + !! types of output for selected areas but any output types + !! that involve global communications (e.g. mpp_max, glob_sum) + !! should be protected from selective switching by the + !! for_all argument + !!---------------------------------------------------------------------- + LOGICAL :: setto, for_all + TYPE(sn_ctl) :: sn_cfctl + !!---------------------------------------------------------------------- + IF( for_all ) THEN + sn_cfctl%l_runstat = setto + sn_cfctl%l_trcstat = setto + ENDIF + sn_cfctl%l_oceout = setto + sn_cfctl%l_layout = setto + sn_cfctl%l_mppout = setto + sn_cfctl%l_mpptop = setto + END SUBROUTINE nemo_set_cfctl + + !!====================================================================== +END MODULE nemogcm + diff --git a/quantum_espresso/README.md b/quantum_espresso/README.md index 3a6542ebf8a721e1674315496669102280759da1..8492a9cf458a72b525f79e8ae1b3044c00b90fed 100644 --- a/quantum_espresso/README.md +++ b/quantum_espresso/README.md @@ -9,14 +9,12 @@ Full documentation is available from the project website [QuantumEspresso](https In this README we give information relevant for its use in the UEABS. ### Standard CPU version -For the UEABS activity we have used mainly version v6.0 but later versions are now available. +For the UEABS activity we have used mainly version v6.5 but later versions are now available. ### GPU version The GPU port of Quantum Espresso is a version of the program which has been -completely re-written in CUDA FORTRAN by Filippo Spiga. The version program used in these -experiments is v6.0, even though further versions becamse available later during the -activity. - +completely re-written in CUDA FORTRAN. The version program used in these +experiments is v6.5a1, even though later versions may be available. ## Installation and requirements ### Standard @@ -24,7 +22,7 @@ The Quantum Espresso source can be downloaded from the projects GitHub repositor ### GPU version For complete build requirements and information see the following GitHub site: -[QE-GPU](https://github.com/fspiga/qe-gpu) +[QE-GPU](https://gitlab.com/QEF/q-e-gpu/-/releases) A short summary is given below: Essential @@ -45,14 +43,14 @@ Optional ### Standard From the website, for example: ```bash -wget https://github.com/QEF/q-e/releases/download/qe-6.3/qe-6.3.tar.gz +wget https://github.com/QEF/q-e/releases/download/qe-6.5/qe-6.5.tar.gz ``` ### GPU -Available from the web site given above. You can use, for example, ```git clone``` +Available from the web site given above. You can use, for example, ```wget``` to download the software: ```bash -git clone https://github.com/fspiga/qe-gpu.git +wget https://gitlab.com/QEF/q-e-gpu/-/archive/qe-gpu-6.5a1/q-e-gpu-qe-gpu-6.5a1.tar.gz ``` ## Compiling and installing the application @@ -71,26 +69,36 @@ make; make install ``` ### GPU -Check the __README.md__ file in the downloaded files since the -procedure varies from distribution to distribution. -Most distributions do not have a ```configure``` command. Instead you copy a __make.inc__ -file from the __install__ directory, and modify that directly before running make. -A number of templates are available in the distribution: -- make.inc_x86-64 -- make.inc_CRAY_PizDaint -- make.inc_POWER_DAVIDE -- make.inc_POWER_SUMMITDEV - -The second and third are particularly relevant in the PRACE infrastructure (ie. for CSCS -PizDaint and CINECA DAVIDE). -Run __make__ to see the options available. For the UEABS you should select the -pw program (the only module currently available) +The GPU version is configured similarly to the CPU version, the only exception being that the configure script +will check for the presence of PGI and CUDA libraries. +A typical configure might be + +```bash +./configure --with-cuda=XX --with-cuda-runtime=YY --with-cuda-cc=ZZ --enable-openmp [ --with-scalapack=no ] +``` +where `XX` is the location of the CUDA Toolkit (in HPC environments is +generally `$CUDA_HOME`), `YY` is the version of the cuda toolkit and `ZZ` +is the compute capability of the card. +For example, + +```bash +./configure --with-cuda=$CUDA_HOME --with-cuda-cc=60 --with-cuda-runtime=9.2 +``` +The __dev-tools/get_device_props.py__ script is available if you dont know these values. + +Compilation is then performed as normal by ``` make pw ``` - -The QE-GPU executable will appear in the directory `GPU/PW` and is called `pw-gpu.x`. +#### Example compilation of Quantum Espresso for GPU based machines + +```bash +module load pgi cuda +./configure --with-cuda=$CUDA_HOME --with-cuda-cc=70 --with-cuda-runtime=10.2 +make -j8 pw +``` + ## Running the program - general procedure @@ -103,7 +111,7 @@ input files are of two types: The pseudopotential files are placed in a directory specified in the control file with the tag pseudo\_dir. Thus if we have -```shell +```bash pseudo_dir=./ ``` then QE-GPU will look for the pseudopotential @@ -111,13 +119,13 @@ files in the current directory. If using the PRACE benchmark suite the data files can be downloaded from the PRACE respository. For example, -```shell +```bash wget https://repository.prace-ri.eu/ueabs/Quantum_Espresso/QuantumEspresso_TestCaseA.tar.gz ``` Once uncompressed you can then run the program like this (e.g. using MPI over 16 cores): -```shell +```bash mpirun -n 16 pw-gpu.x -input pw.in ``` @@ -126,6 +134,22 @@ but check your system documentation since mpirun may be replaced by allowed to run MPI programs interactively without using the batch system. +### Running on GPUs +The procedure is identical to running on non accelerator-based hardware. +If GPUs are being used then the following will appear in the program output: + +``` + GPU acceleration is ACTIVE. +``` + +GPU acceleration can be switched off by setting the following environment variable: + +```bash +$ export USEGPU=no +``` + + + ### Parallelisation options Quantum Espresso uses various levels of parallelisation, the most important being MPI parallelisation over the *k points* available in the input system. This is achieved with the ```-npool``` program option. @@ -154,82 +178,68 @@ srun -n 64 pw.x -npool 2 -ndiag 4 -input pw.in ### Hints for running the GPU version - -#### Memory limitations The GPU port of Quantum Espresso runs almost entirely in the GPU memory. This means that jobs are restricted by the memory of the GPU device, normally 16-32 GB, regardless of the main node memory. Thus, unless many nodes are used the user is likely to see job failures due to lack of memory, even for small datasets. For example, on the CSCS Piz Daint supercomputer each node has only 1 NVIDIA Tesla P100 (16GB) which means that you will need at least 4 nodes to run even the smallest dataset (AUSURF in the UEABS). - ## Execution In the UEABS repository you will find a directory for each computer system tested, together with installation instructions and job scripts. In the following we describe in detail the execution procedure for the Marconi computer system. -### Execution on the Cineca Marconi KNL system +### Execution on the Cineca Galileo (x86) system -Quantum Espresso has already been installed for the KNL nodes of -Marconi and can be accessed via a specific module: +Quantum Espresso has already been installed on the cluster +and can be accessed via a specific module: -``` shell -module load profile/knl -module load autoload qe/6.0_knl +``` bash +module load profile/phys +module load autoload qe/6.5 ``` -On Marconi the default is to use the MCDRAM as cache, and have the -cache mode set as quadrant. Other settings for the KNLs on Marconi -haven't been substantailly tested for Quantum Espresso (e.g. flat -mode) but significant differences in performance for most inputs are -not expected. - -An example SLURM batch script for the A2 partition is given below: +An example SLURM batch script is given below: -``` shell +``` bash #!/bin/bash -#SBATCH -N2 -#SBATCH --tasks-per-node=64 -#SBATCH -A -#SBATCH -t 1:00:00 - +#SBATCH --time=06:00:00 # Walltime in hh:mm:ss +#SBATCH --nodes=4 # Number of nodes +#SBATCH --ntasks-per-node=18 # Number of MPI ranks per node +#SBATCH --cpus-per-task=2 # Number of OpenMP threads for each MPI process/rank +#SBATCH --mem=118000 # Per nodes memory request (MB) +#SBATCH --account= +#SBATCH --job-name=jobname +#SBATCH --partition=gll_usr_prod module purge -module load profile/knl -module load autoload qe/6.0_knl +module load profile/phys +module load autoload qe/6.5 -export OMP_NUM_THREADS=1 +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK export MKL_NUM_THREADS=${OMP_NUM_THREADS} -srun pw.x -npool 2 -ndiag 16 -input file.in > file.out - +srun pw.x -npool 4 -input file.in > file.out ``` -In the above with the SLURM directives we have asked for 2 KNL nodes (each with 68 cores) in -cache/quadrant mode and 93 Gb main memory each. We are running QE in MPI-only -mode using 64 MPI processes/node with the k-points in 2 pools; the diagonalisation of the Hamiltonian -will be done by 16 (4x4) tasks. +In the above with the SLURM directives we have asked for 4 nodes, 18 MPI tasks per node and 2 OpenMP threads +per task. -Note that this script needs to be submitted using the KNL scheduler as follows: +Note that this script needs to be submitted using SLURM scheduler as follows: -``` shell -module load env-knl +``` bash sbatch myjob ``` -Please check the Cineca documentation for information on using the -[Marconi KNL partition] -(https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide#UG3.1:MARCONIUserGuide-SystemArchitecture). - ## UEABS test cases | UEABS name | QE name | Description | k-points | Notes| |------------|---------------|-------------|----------|------| | Small test case | AUSURF | 112 atoms | 2 | < 4-8 nodes on most systems | -| Large test case | TA2O5 | Tantalum oxide| 26| Medium scaling, often 20 nodes | -| Very Large test case | CNT | Carbon nanotube | | Large scaling runs only. Memory and time very requirements high| +| Large test case | GRIR443 | 432 | 4| Medium scaling, often 20 nodes | +| Very Large test case | CNT | Carbon nanotube | | Large scaling runs only. Memory and time requirements very high| -__Last updated: 29-April-2019__ +__Last updated: 22-October-2020__