Commit 0b13d892 authored by Victor's avatar Victor
Browse files

IMPROVE typos

parent 4ed36c0c
......@@ -6,22 +6,22 @@ Deliverable 7.7: Performance and energy metrics on PCP systems
Executive Summary
*****************
This document describes efforts deployed in order to exploit PRACE Pre-Comercial Procurment (PCP) machines. It aims at giving an overview of what can be done on in term of performances and energy analysis on this prototypes. The key focus have been given to a general study using the PRACE Unified European Application Benchmark Suite (UEABS) and a more detailed case study porting a solver stack using cuting edge tools.
This document describes efforts deployed in order to exploit PRACE Pre-Commercial Procurement (PCP) machines. It aims at giving an overview of what can be done on in term of performances and energy analysis on this prototypes. The key focus have been given to a general study using the PRACE Unified European Application Benchmark Suite (UEABS) and a more detailed case study porting a solver stack using cutting edge tools.
This work has been undertaken by the 4IP-extension task "Performance and energy metrics on PCP systems" which is a follow-up of the Task 7.2B "Accelerators benchmarks" in the PRACE Fourth Implementation Phase (4IP).
It also heads in the direction of the Task 7.3 in 5IP meaning to merge PRACE accelerated and standart benchmark suites, as codes of the latter have been run on accelerators in this task.
It also heads in the direction of the Task 7.3 in 5IP meaning to merge PRACE accelerated and standard benchmark suites, as codes of the latter have been run on accelerators in this task.
As a result, ALYA, Code_Saturne, CP2K, GPAW, GROMACS, NAMD, PFARM, QCD, Quantum Espresso, SHOC and Specfem3D_Globe (already ported to accelerator) and GADGET and NEMO (newly ported) have been selected to run on Intel KNL and NVDIA GPU to give an overview of performances and energy measurement.
Also the HORSE+MaPHyS+PaStiX solver stack have been selected to be ported on Intel KNL. Focus here has been given to performing an energetic profiling of theses codes and studying the influence of several parameters driving the accuracy and numerical efficiency of the underlying simulations.
Also, the HORSE+MaPHyS+PaStiX solver stack have been selected to be ported on Intel KNL. Focus here has been given to performing an energetic profiling of theses codes and studying the influence of several parameters driving the accuracy and numerical efficiency of the underlying simulations.
Introduction
************
The work produced within this task is driven by the delivery of PRACE PCP machines. It aims at giving manufacturer-independent performance and energy metrics for future Exascale systems. It is also an opportunity to explore and test cuting edge energy hardware stack and tool developped within the scope of PCP.
The work produced within this task is driven by the delivery of PRACE PCP machines. It aims at giving manufacturer-independent performance and energy metrics for future hexa-scale systems. It is also an opportunity to explore and test cutting edge energy hardware stack and tool developed within the scope of PCP.
As stated in the Milestone 33, this document will present metrics for selected code among the UEABS. It allows to show results concerning many fields used used by European scientific communities. As well as it will go deeper in the porting and energetic profiling activities using the HORSE+MaPHyS+PaStiX solver stack as exemple.
As stated in the Milestone 33, this document will present metrics for selected code among the UEABS. It allows to show results concerning many fields used among European scientific communities. As well as it will go deeper in the porting and energetic profiling activities using the HORSE+MaPHyS+PaStiX solver stack as example.
Section :ref:`d77_cluster_specs` will details hardware and software specifications where metrics have been carried out. On section :ref:`d77_ueabs_metrics` the metrics for UEABS will be bring together. The work on porting and energy profiling will be presented in section :ref:`d77_port_profile`. Section :ref:`d77_conclusion` will conclude and outline further work on PCP prototypes.
......@@ -30,16 +30,16 @@ Section :ref:`d77_cluster_specs` will details hardware and software specificatio
Clusters specifications and access
**********************************
PRACE PCP project include tree different prototypes using respectivly Xeon Phi, GPU and FPGA. First two machines become more and more common in HPC infrastructures, making the energy stack being the invovation. On the oposite the last architechture is brand new in this field making it harder get familliar with.
PRACE PCP project include tree different prototypes using respectively Xeon Phi, GPU and FPGA. First two machines become more and more common in HPC infrastructures, making the energy stack being the innovation. On the opposite the last architecture is brand new in this field making it harder get familiar with.
As you will see in the section :ref:`d77_machine_access` tight deadlines didn't let the time to produce relevant metrics on the FPGA cluster. This is why only GPU and KNL prototype are presented here.
As demonstrated in section :ref:`d77_machine_access` tight deadlines didn't let the time to produce relevant metrics on the FPGA cluster. This is why only GPU and KNL prototype are presented here.
.. _d77_machine_access:
Access to machines
^^^^^^^^^^^^^^^^^^
Working with prototypes can be painfull in term of project managment and meeting deadlines. This section is dedicated to give a feedback on accessing the hardware and software stack.
Working with prototypes can be painful in term of project management and meeting deadlines. This section is dedicated to give a feedback on accessing the hardware and software stack.
The timeline_ outlines the initial tight deadlines for this project. Also showing that access to machines have been possible quite late during the phase for running codes.
......@@ -49,18 +49,18 @@ The timeline_ outlines the initial tight deadlines for this project. Also showin
4IP-extention project timeline. On top of the figure are printed periods names and on the bottom key dates. Periods in grey stands for task preparation, periods in blue stands for documentation redaction and period in green stand for technical work.
The table :ref:`table-pcp-systems-access` shows the precise timeline. To this delays some technical interuptions occured right at the end of the running phase, not helping with the redaction of this document:
The table :ref:`table-pcp-systems-access` shows the precise timeline. To this delays some technical interruptions occurred right at the end of the running phase, not helping with the redaction of this document:
**PCP-KNL:**
- Closed from 22th november to December the 4th
- login node have been down form the 5th to the 7th of December.
- enrgy metrics tools down from 5th to the 12th of December
- closed from 22th November to December the 4th
- login node hasf been down form the 5th to the 7th of December.
- energy metrics tools down from 5th to the 12th of December
**DAVIDE-GPU**
- slurm not working from 6th to the 11th of December
- energy metrics tools not *radomly* not working during begining of December
- energy metrics tools not *randomly* not working during beginning of December
.. _table-pcp-systems-access:
.. table:: PCP Systems access dates
......@@ -87,13 +87,13 @@ The table :ref:`table-pcp-systems-access` shows the precise timeline. To this de
Performances and energy metrics of UEABS on PCP systems
*******************************************************
This sections will present results of UEABS on both GPU and KNL systems. This benchmark suite is made of two set of codes that covers each others. The former is used to be run on standart CPU and de latest have been ported to accelerators. The accelerated suite is described in the PRACE 4IP Deliverable 7.5. And the standart suite is described on the PRACE UEABS official webpage.
This section will present results of UEABS on both GPU and KNL systems. This benchmark suite is made of two set of codes that covers each other's. The former is used to be run on standard CPU and de latest have been ported to accelerators. The accelerated suite is described in the PRACE 4IP Deliverable 7.5. And the standard suite is described on the PRACE UEABS official webpage.
Metrics exibited systematically will be time to solution and energy to solution. This choice allows to measure the exact same computation. Indeed some code features specific performance metrics, eg. not taking into account warm up and teardown phases. This metrics are thus not biased and small benchmark test cases can then give more information about an hypothetic production runs. Unfortunately such a system is not available yet for energy, and this metrics will be shown as *side metrics*.
Metrics exhibited systematically will be time to solution and energy to solution. This choice allows to measure the exact same computation. Indeed, some code features specific performance metrics, e.g. not considering warm up and teardown phases. This metrics are thus not biased and small benchmark test cases can then give more information about an hypothetic production runs. Unfortunately, such a system is not available yet for energy, and this metrics will be shown as *side metrics*.
In order to be comparable between machines, the :code:`Cumulative (all nodes) Total energy (J)` has been selected for the GPU machine. And the :code:`nodes.energy` has been selected for the KNL prototype. Both measure full nodes consumption in Joules.
Each code will be presented along with a short description and the full set of metrics. The section ends with a recap chart with a line of metric picked up for it's relevance.
Each code will be presented along with a short description and the full set of metrics. The section ends with a recap chart with a line of metric picked up for its relevance.
ALYA
......
Xeon Phi
^^^^^^^^
This machine has been designed by `Atos/Bull`_ and is hosted at CINES_ in Montpellier, France. It is made of 76 Bull Sequana X1210 blades, each including 3 Xeon Phi KNL nodes. It totals a theoretical peak performance of 465 Tflop/s with an estimated consumption of 42kW.
This machine has been designed by `Atos/Bull`_ and is hosted at CINES_ in Montpellier, France. It is made of 76 Bull Sequana X1210 blades, each including 3 Xeon Phi KNL nodes. It totals a theoretical peak performance of 465 Tflop/s with an estimated consumption of 42kW.
.. note::
......@@ -23,7 +23,7 @@ Hardware features the following nodes:
* 1x Intel Xeon Phi 7250 processor (KNL), 68 cores cadenced to 1.4 GHz with SMT 4.
* 96GB memory, 16GBx6 DDR4 DIMMs
* intranode comunications integrated using InfiniBand EDR
* intranode communications integrated using InfiniBand EDR
* 100% Hot water cooled nodes
* Half of the configuration feature liquid cooled Power Supply Unit (PSU) make this part of the machine 100% liquid cooled.
* MooseFS I/O
......@@ -54,15 +54,15 @@ Here's an example of usage in a submission script:
.. code-block:: shell
#SBATCH -N 2
#SBATCH -time 00:30:00
#SBATCH -J Specfem3D_Globe
#SBATCH -n 89
#SBATCH -N 2
#SBATCH -time 00:30:00
#SBATCH -J Specfem3D_Globe
#SBATCH -n 89
module load intel/17.2 intelmpi/2018.0.061
module load hdeeviz/hdeeviz_intelmpi_2018.0.061
module load intel/17.2 intelmpi/2018.0.061
module load hdeeviz/hdeeviz_intelmpi_2018.0.061
hdeeviz mpirun -n 89 $PWD/bin/xspecfem3D
hdeeviz mpirun -n 89 $PWD/bin/xspecfem3D
Access to generated data will be made through the Grafana web interface:
......
Power8 + GPU
^^^^^^^^^^^^
D.A.V.I.D.E has been designed by `E4 computer engineering`_ and is hosted at CINECA_ in Bologna, Italy. It totals a theoretical peak performance of 990 TFlop/s (double precision) and an estimated power consumption of less than 100kW. A more detailed description can be found on the `E4 dedicated webpage`_.
D.A.V.I.D.E has been designed by `E4 computer engineering`_ and is hosted at CINECA_ in Bologna, Italy. It totals a theoretical peak performance of 990 TFlop/s (double precision). A more detailed description can be found on the `E4 dedicated webpage`_.
.. note:: In order to access the machine BCO should send an email to `Victor Cameo Ponz`_ so that.
......@@ -13,8 +13,8 @@ Hardware features fat-nodes with the following design:
* x2 IBM POWER8+ processors, ie 8x2 cores with Simultaneous Multi-Threading (SMT) 8
* x4 NVIDIA P100 GPU with 16GB High Bandwidth Memory 2 (HBM2)
* intranode comunications integrated using NVLink
* extranode comunications integrated using Infiniband ERD interconnect in fat-tree with no oversubscription topology
* intranode communications integrated using NVLink
* extranode communications integrated using Infiniband ERD interconnect in fat-tree with no oversubscription topology
* CPU and GPU direct hot water (~27°C) cooling, removing 75-80% of the total heat
* remaining 20-25% heat is air-cooled
......@@ -23,7 +23,7 @@ Each compute node has a theoritical peak performance of 22 Tflop/s (double preci
Energy sampling technology
""""""""""""""""""""""""""
Information is collected from processors, memory, GPUs and fans exploiting Analig-to-Digital Converter in the embedded SoC. It provides sampling up to 800 kHz lowered to 50kHz on power measuring sensor outputs.
Information is collected from processors, memory, GPUs and fans exploiting Analog-to-Digital Converter in the embedded SoC. It provides sampling up to 800 kHz lowered to 50kHz on power measuring sensor outputs.
The technology has been developed in collaboration with the University of Bologna which developed the :code:`get_job_energy <job_id>` program. Usage is straight forward and has the following verbose output:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment