Skip to content 6.03 KiB
Newer Older
# README - Wireworld Example (C Version)

## Description

Thomas Ponweiser's avatar
Thomas Ponweiser committed
For a general description of *Wireworld* and the file format, see the `` file in the parent directory.

This code sample demonstrates:

 * How to use **MPI Cartesian topologies** and associated convenience functions (i.e. `MPI_Cart_create`, `MPI_Dims_create`, `MPI_Cart_get` and `MPI_Cart_rank`)
 * How to use **MPI parallel I/O** for collectively reading and writing 2-dimensional array data (i.e. `MPI_File_open`, `MPI_File_set_errhandler`, `MPI_File_set_view`, `MPI_File_read_all` and `MPI_File_write_all`).
 * How to implement **halo exchange** (i.e. the exchange of *ghost cell data*) using three different approaches (each with and without overlap of communication and computation):
   1. Using **MPI Graph communicator** and **Neighborhood collective operations** (i.e. `MPI_Dist_graph_create_adjacent` and `MPI_[N/In]eighbor_alltoallw`).
   1. Using **MPI Point-to-Point** communication (i.e. `MPI_Isend`, `MPI_Irecv` and `MPI_Waitall`).
   1. Using **MPI persistent communication requests** (i.e. `MPI_Send_init`, `MPI_Recv_init`, `MPI_Startall` and `MPI_Waitall`).
 * How to use the **MPI subarray datatype** (i.e. `MPI_Type_create_subarray`) for I/O and halo exchange.

The code sample is structured as follows:

 * `configuration.c`, `configuration.h`: Command line parsing and basic logging facilities.
 * `io.c`, `io.h`: Collective I/O of the cellular automaton state.
 * `main.c`: The main program
 * `mpitypes.c`, `mpitypes.h`: Creation of MPI datatypes.
 * `simulation.c`, `simulation.h`: Demonstration of 6 approaches for implementing a Wireworld cellular automaton.
 * `world.c`, `world.h`: Initialization of the cellular automaton and associated MPI objects.
   * `create_cart_comm`: Creation of the MPI Cartesian communicator.
   * `world_init`: Domain decomposition and buffer allocation.
   * `world_init_io_type`: Creation of the MPI subarray datatype for I/O.
Thomas Ponweiser's avatar
Thomas Ponweiser committed
   * `world_init_neighborhood`: Identification of neighboring processes, creation of MPI graph communicator and MPI Datatypes for halo exchange.
   * `world_init_persistent_requests`: Initialization of persistent requests for halo data exchange.

## Release Date


## Version History

 * 2016-10-24: Initial Release on PRACE CodeVault repository

## Contributors

 * Thomas Ponweiser - [](

## Copyright

This code is available under Apache License, Version 2.0 - see also the license file in the CodeVault root directory.

## Languages

This sample is entirely written in C.

## Parallelisation

This sample uses MPI-3 for parallelization.

## Level of the code sample complexity

Intermediate / Advanced

## Compiling

Follow the compilation instructions given in the main directory of the kernel samples directory (`/hpc_kernel_samples/`).

## Running

Assuming that the input file `primes.wi` is in your current working directory, to run the program you may use something similar to

mpirun -n [nprocs] ./5_structured_wireworld_c primes

either on the command line or in your batch script. Note that only the input file's basename (omitting the file extension) is passed to the program.

### Command line arguments

 * `-v [0-3]`: Specify the verbosity level - 0: OFF; 1: INFO (Default); 2: DEBUG; 3: TRACE
 * `--sparse`: Use MPI neighborhood collective operations, a.k.a *sparse colletive operations*, i.e. `MPI_[N/In]eighbor_alltoallw` for halo data exchange (Default).
 * `--p2p`: Use MPI Point-to-Point communication for halo data exchange, i.e. `MPI_Isend`, `MPI_Irecv` and `MPI_Waitall`.
 * `--persist`: Use MPI persistent requets (created with `MPI_Send_init` and `MPI_Recv_init`) for halo data exchange (using `MPI_Startall` and `MPI_Waitall`).
 * `--overlap`: Overlap communication and computation, i.e. compute inner cells while doing halo data exchange (Default).
Thomas Ponweiser's avatar
Thomas Ponweiser committed
 * `--no-overlap`: Do not overlap communication and computation.
 * `-x [X]`, `--nprocs-x [X]`: Use `X` processes in x-direction for MPI Cartesian communicator (optional).
 * `-y [Y]`, `--nprocs-y [Y]`: Use `Y` processes in y-direction for MPI Cartesian communicator (optional).
 * `-i [N]`, `--iterations [N]`: Do `N` iterations, creating `N` output files with the current state of the cellular automaton; Default: 1.
Thomas Ponweiser's avatar
Thomas Ponweiser committed
 * `-g [G]`, `--generations-per-iteration [G]`: Number of generations to simulate between output iterations; Default: 5000.

For large numbers as arguments to the option `-g`, the suffixes 'k' or 'M' may be used. For example, `-g 50k` specifies 50-thousand and `-g 1M` specifies one million generations per output iteration.

### Example

If you run

mpirun -n 12 ./5_structured_wireworld_c -i 10 -g 50k -v 2 --nprocs-x 3 primes

the output should look similar to

 * Verbosity level:           DEBUG (2)
 * Input file:                primes.wi
 * Transmission mode:         Sparse collective - MPI_Dist_graph_create_adjacent / MPI_[N/In]eighbor_alltoallw
 * Overlap mode:              Overlapping communication and computation
 * Grid of processes:         3 x 4
 * Number of iterations:      10
 * Generations per iteration: 50000

Reading 'primes.wi'...

Read header (8 characters).
Global size: 632 x 958

Creating Cartesian communicator...
INFO: MPI reordered ranks: NO
Creating MPI distributed graph communicator...
Running 10 iterations with 50000 generations per iteration.

Generation 50000     - written 'primes+000050000.wi'.
Generation 100000    - written 'primes+000100000.wi'.
Generation 150000    - written 'primes+000150000.wi'.
Generation 200000    - written 'primes+000200000.wi'.
Generation 250000    - written 'primes+000250000.wi'.
Generation 300000    - written 'primes+000300000.wi'.
Generation 350000    - written 'primes+000350000.wi'.
Generation 400000    - written 'primes+000400000.wi'.
Generation 450000    - written 'primes+000450000.wi'.
Generation 500000    - written 'primes+000500000.wi'.

 * Generations per second:  2750
 * Net simulation time (s): 181.787989
 * Net I/O time (s):        0.034390
 * Total time (s):          181.822473