Newer
Older
Thomas Ponweiser
committed
# README - Wireworld Example (C Version)
## Description
For a general description of *Wireworld* and the file format, see the `README.md` file in the parent directory.
Thomas Ponweiser
committed
This code sample demonstrates:
* How to use **MPI Cartesian topologies** and associated convenience functions (i.e. `MPI_Cart_create`, `MPI_Dims_create`, `MPI_Cart_get` and `MPI_Cart_rank`)
* How to use **MPI parallel I/O** for collectively reading and writing 2-dimensional array data (i.e. `MPI_File_open`, `MPI_File_set_errhandler`, `MPI_File_set_view`, `MPI_File_read_all` and `MPI_File_write_all`).
* How to implement **halo exchange** (i.e. the exchange of *ghost cell data*) using three different approaches (each with and without overlap of communication and computation):
1. Using **MPI Graph communicator** and **Neighborhood collective operations** (i.e. `MPI_Dist_graph_create_adjacent` and `MPI_[N/In]eighbor_alltoallw`).
1. Using **MPI Point-to-Point** communication (i.e. `MPI_Isend`, `MPI_Irecv` and `MPI_Waitall`).
1. Using **MPI persistent communication requests** (i.e. `MPI_Send_init`, `MPI_Recv_init`, `MPI_Startall` and `MPI_Waitall`).
* How to use the **MPI subarray datatype** (i.e. `MPI_Type_create_subarray`) for I/O and halo exchange.
The code sample is structured as follows:
* `configuration.c`, `configuration.h`: Command line parsing and basic logging facilities.
* `io.c`, `io.h`: Collective I/O of the cellular automaton state.
* `main.c`: The main program
* `mpitypes.c`, `mpitypes.h`: Creation of MPI datatypes.
* `simulation.c`, `simulation.h`: Demonstration of 6 approaches for implementing a Wireworld cellular automaton.
* `world.c`, `world.h`: Initialization of the cellular automaton and associated MPI objects.
* `create_cart_comm`: Creation of the MPI Cartesian communicator.
Thomas Ponweiser
committed
* `world_init`: Domain decomposition and buffer allocation.
* `world_init_io_type`: Creation of the MPI subarray datatype for I/O.
* `world_init_neighborhood`: Identification of neighboring processes, creation of MPI graph communicator and MPI Datatypes for halo exchange.
Thomas Ponweiser
committed
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
* `world_init_persistent_requests`: Initialization of persistent requests for halo data exchange.
## Release Date
2016-10-24
## Version History
* 2016-10-24: Initial Release on PRACE CodeVault repository
## Contributors
* Thomas Ponweiser - [thomas.ponweiser@risc-software.at](mailto:thomas.ponweiser@risc-software.at)
## Copyright
This code is available under Apache License, Version 2.0 - see also the license file in the CodeVault root directory.
## Languages
This sample is entirely written in C.
## Parallelisation
This sample uses MPI-3 for parallelization.
## Level of the code sample complexity
Intermediate / Advanced
## Compiling
Follow the compilation instructions given in the main directory of the kernel samples directory (`/hpc_kernel_samples/README.md`).
## Running
Assuming that the input file `primes.wi` is in your current working directory, to run the program you may use something similar to
```
mpirun -n [nprocs] ./5_structured_wireworld_c primes.wi
Thomas Ponweiser
committed
```
either on the command line or in your batch script. Note that only the input file's basename (omitting the file extension) is passed to the program.
### Command line arguments
* `-v [0-3]`: Specify the verbosity level - 0: OFF; 1: INFO (Default); 2: DEBUG; 3: TRACE
* `--sparse`: Use MPI neighborhood collective operations, a.k.a *sparse colletive operations*, i.e. `MPI_[N/In]eighbor_alltoallw` for halo data exchange (Default).
* `--p2p`: Use MPI Point-to-Point communication for halo data exchange, i.e. `MPI_Isend`, `MPI_Irecv` and `MPI_Waitall`.
* `--persist`: Use MPI persistent requets (created with `MPI_Send_init` and `MPI_Recv_init`) for halo data exchange (using `MPI_Startall` and `MPI_Waitall`).
* `--overlap`: Overlap communication and computation, i.e. compute inner cells while doing halo data exchange (Default).
* `--no-overlap`: Do not overlap communication and computation.
Thomas Ponweiser
committed
* `-x [X]`, `--nprocs-x [X]`: Use `X` processes in x-direction for MPI Cartesian communicator (optional).
* `-y [Y]`, `--nprocs-y [Y]`: Use `Y` processes in y-direction for MPI Cartesian communicator (optional).
* `-i [N]`, `--iterations [N]`: Do `N` iterations, creating `N` output files with the current state of the cellular automaton; Default: 1.
* `-g [G]`, `--generations-per-iteration [G]`: Number of generations to simulate between output iterations; Default: 5000.
Thomas Ponweiser
committed
For large numbers as arguments to the option `-g`, the suffixes 'k' or 'M' may be used. For example, `-g 50k` specifies 50-thousand and `-g 1M` specifies one million generations per output iteration.
### Example
If you run
```
mpirun -n 12 ./5_structured_wireworld_c -i 10 -g 50k -v 2 --nprocs-x 3 primes.wi
Thomas Ponweiser
committed
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
```
the output should look similar to
```
Configuration:
* Verbosity level: DEBUG (2)
* Input file: primes.wi
* Transmission mode: Sparse collective - MPI_Dist_graph_create_adjacent / MPI_[N/In]eighbor_alltoallw
* Overlap mode: Overlapping communication and computation
* Grid of processes: 3 x 4
* Number of iterations: 10
* Generations per iteration: 50000
Reading 'primes.wi'...
Read header (8 characters).
Global size: 632 x 958
Creating Cartesian communicator...
INFO: MPI reordered ranks: NO
Creating MPI distributed graph communicator...
Running 10 iterations with 50000 generations per iteration.
Generation 50000 - written 'primes+000050000.wi'.
Generation 100000 - written 'primes+000100000.wi'.
Generation 150000 - written 'primes+000150000.wi'.
Generation 200000 - written 'primes+000200000.wi'.
Generation 250000 - written 'primes+000250000.wi'.
Generation 300000 - written 'primes+000300000.wi'.
Generation 350000 - written 'primes+000350000.wi'.
Generation 400000 - written 'primes+000400000.wi'.
Generation 450000 - written 'primes+000450000.wi'.
Generation 500000 - written 'primes+000500000.wi'.
Statistics:
* Generations per second: 2750
* Net simulation time (s): 181.787989
* Net I/O time (s): 0.034390
* Total time (s): 181.822473
Done.
```