.. _d77: Deliverable 7.7: Performance and energy metrics on PCP systems ============================================================== Executive Summary ***************** This document describes efforts deployed in order to exploit PRACE Pre-Commercial Procurement (PCP) machines. It aims at giving an overview of what can be done on in term of performances and energy analysis on this prototypes. The key focus have been given to a general study using the PRACE Unified European Application Benchmark Suite (UEABS) and a more detailed case study porting a solver stack using cutting edge tools. This work has been undertaken by the 4IP-extension task "Performance and energy metrics on PCP systems" which is a follow-up of the Task 7.2B "Accelerators benchmarks" in the PRACE Fourth Implementation Phase (4IP). It also heads in the direction of the Task 7.3 in 5IP meaning to merge PRACE accelerated and standard benchmark suites, as codes of the latter have been run on accelerators in this task. As a result, ALYA, Code_Saturne, CP2K, GPAW, GROMACS, NAMD, PFARM, QCD, Quantum Espresso, SHOC and Specfem3D_Globe (already ported to accelerator) and GADGET and NEMO (newly ported) have been selected to run on Intel KNL and NVDIA GPU to give an overview of performances and energy measurement. Also, the HORSE+MaPHyS+PaStiX solver stack have been selected to be ported on Intel KNL. Focus here has been given to performing an energetic profiling of theses codes and studying the influence of several parameters driving the accuracy and numerical efficiency of the underlying simulations. Introduction ************ The work produced within this task is driven by the delivery of PRACE PCP machines. It aims at giving manufacturer-independent performance and energy metrics for future hexa-scale systems. It is also an opportunity to explore and test cutting edge energy hardware stack and tool developed within the scope of PCP. As stated in the Milestone 33, this document will present metrics for selected code among the UEABS. It allows to show results concerning many fields used among European scientific communities. As well as it will go deeper in the porting and energetic profiling activities using the HORSE+MaPHyS+PaStiX solver stack as example. Section :ref:`d77_cluster_specs` will details hardware and software specifications where metrics have been carried out. On section :ref:`d77_ueabs_metrics` the metrics for UEABS will be bring together. The work on porting and energy profiling will be presented in section :ref:`d77_port_profile`. Section :ref:`d77_conclusion` will conclude and outline further work on PCP prototypes. .. _d77_cluster_specs: Clusters specifications and access ********************************** PRACE PCP project include tree different prototypes using respectively Xeon Phi, GPU and FPGA. First two machines become more and more common in HPC infrastructures, making the energy stack being the innovation. On the opposite the last architecture is brand new in this field making it harder get familiar with. As demonstrated in section :ref:`d77_machine_access` tight deadlines didn't let the time to produce relevant metrics on the FPGA cluster. This is why only GPU and KNL prototype are presented here. .. _d77_machine_access: Access to machines ^^^^^^^^^^^^^^^^^^ Working with prototypes can be painful in term of project management and meeting deadlines. This section is dedicated to give a feedback on accessing the hardware and software stack. The timeline_ outlines the initial tight deadlines for this project. Also showing that access to machines have been possible quite late during the phase for running codes. .. _timeline: .. figure:: /deliverable_d7.7/timeline.png 4IP-extention project timeline. On top of the figure are printed periods names and on the bottom key dates. Periods in grey stands for task preparation, periods in blue stands for documentation redaction and period in green stand for technical work. The table :ref:`table-pcp-systems-access` shows the precise timeline. To this delays some technical interruptions occurred right at the end of the running phase, not helping with the redaction of this document: **PCP-KNL:** - closed from 22th November to December the 4th - login node hasf been down form the 5th to the 7th of December. - energy metrics tools down from 5th to the 12th of December **DAVIDE-GPU** - slurm not working from 6th to the 11th of December - energy metrics tools not *randomly* not working during beginning of December .. _table-pcp-systems-access: .. table:: PCP Systems access dates :widths: auto +-----------------------+------------------+-----------------+------------------+ | | KNL | GPU | FPGA | +=======================+==================+=================+==================+ | Envisioned | June 2017 | July 2017 | August 2017 | +-----------------------+------------------+-----------------+------------------+ | Actual access | 1 September 2017 | 16 October 2017 | 2 November 2017 | +-----------------------+------------------+-----------------+------------------+ | Acces to energy stack | 6 October 2017 | 8 November 2017 | / | +-----------------------+------------------+-----------------+------------------+ .. include:: /pcp_systems/e4_gpu.rst .. include:: /pcp_systems/atos_knl.rst .. _d77_ueabs_metrics: Performances and energy metrics of UEABS on PCP systems ******************************************************* This section will present results of UEABS on both GPU and KNL systems. This benchmark suite is made of two set of codes that covers each other's. The former is used to be run on standard CPU and de latest have been ported to accelerators. The accelerated suite is described in the PRACE 4IP Deliverable 7.5. And the standard suite is described on the PRACE UEABS official webpage. Metrics exhibited systematically will be time to solution and energy to solution. This choice allows to measure the exact same computation. Indeed, some code features specific performance metrics, e.g. not considering warm up and teardown phases. This metrics are thus not biased and small benchmark test cases can then give more information about an hypothetic production runs. Unfortunately, such a system is not available yet for energy, and this metrics will be shown as *side metrics*. In order to be comparable between machines, the :code:`Cumulative (all nodes) Total energy (J)` has been selected for the GPU machine. And the :code:`nodes.energy` has been selected for the KNL prototype. Both measure full nodes consumption in Joules. Each code will be presented along with a short description and the full set of metrics. The section ends with a recap chart with a line of metric picked up for its relevance. ALYA ^^^^ Code_Saturne ^^^^^^^^^^^^ CP2K ^^^^ GADGET ^^^^^^ GENE ^^^^ GPAW ^^^^ GROMACS ^^^^^^^ NAMD ^^^^ NEMO ^^^^ PFARM ^^^^^ QCD ^^^ Quantum Espresso ^^^^^^^^^^^^^^^^ SHOC ^^^^ Specfem3D_Globe ^^^^^^^^^^^^^^^ Wrap-up table ^^^^^^^^^^^^^ Here's the envisioned run table issued from the Milestone 33: .. _table-code-definition: .. table:: Code definition :widths: auto +-------------------+------+-----------------------------+------------+-------------------------------+ | | Test | Power8 + GPU | Xeon Phi | | | Code name | case +-----+-----------+-----------+---------+--------+ 4IP-extension BCO + | | # | N # | | | | | | +===================+======+=====+===========+===========+=========+========+===============================+ | | 1 | | ✓ | | ✓ | | Ricard Borrell (BSC) | + ALYA +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Ricard Borrell (BSC) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Charles Moulinec (STFC) | + Code_Saturne +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Charles Moulinec (STFC) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Arno Proeme (EPCC) | + CP2K +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Arno Proeme (EPCC) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✗ | | ✓ | | Volker Weinberg (LRZ) | + GADGET +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✗ | | ✓ | | Volker Weinberg (LRZ) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✗ | | ✓ | | Martti Louhivuori (CSC) | + GPAW +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✗ | | ✓ | | Martti Louhivuori (CSC) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Dimitris Dellis (GRNET) | + GROMACS +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Dimitris Dellis (GRNET) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Dimitris Dellis (GRNET) | + NAMD +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Dimitris Dellis (GRNET) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✗ | | ✓ | | Arno Proeme (EPCC) | + NEMO +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✗ | | ✓ | | Arno Proeme (EPCC) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Mariusz Uchronski (WCNS/PSNC) | + PFARM +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Mariusz Uchronski (WCNS/PSNC) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Jacob Finkenrath (CyI) | + QCD +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Jacob Finkenrath (CyI) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Andrew Emerson (CINECA) | + Quantum Espresso +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Andrew Emerson (CINECA) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✗ | | Valeriu Codreanu (SurfSARA) | + SHOC +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✗ | | Valeriu Codreanu (SurfSARA) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 1 | | ✓ | | ✓ | | Victor Cameo Ponz (CINES) | + Specfem3D_Globe +------+-----+-----------+-----------+---------+--------+-------------------------------+ | | 2 | | ✓ | | ✓ | | Victor Cameo Ponz (CINES) | +-------------------+------+-----+-----------+-----------+---------+--------+-------------------------------+ .. _d77_port_profile: Energetic Analysis of a Solver Stack for Frequency-Domain Electromagnetics ************************************************************************** Numerical approach ^^^^^^^^^^^^^^^^^^ Simulation software ^^^^^^^^^^^^^^^^^^^ MaPHyS algebraic solver ^^^^^^^^^^^^^^^^^^^^^^^ Numerical and performance results ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MaPHyS used in standalone mode """""""""""""""""""""""""""""" Scattering of a plane wave by a PEC sphere """""""""""""""""""""""""""""""""""""""""" .. _d77_conclusion: Conclusion **********