Newer
Older
Thomas Ponweiser
committed
# README - Wireworld Example (C Version)
## Description
For a general description of *Wireworld* and the file format, see the `README.md` file in the parent directory.
Thomas Ponweiser
committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
This code sample demonstrates:
* How to use **MPI Cartesian topologies** and associated convenience functions (i.e. `MPI_Cart_create`, `MPI_Dims_create`, `MPI_Cart_get` and `MPI_Cart_rank`)
* How to use **MPI parallel I/O** for collectively reading and writing 2-dimensional array data (i.e. `MPI_File_open`, `MPI_File_set_errhandler`, `MPI_File_set_view`, `MPI_File_read_all` and `MPI_File_write_all`).
* How to implement **halo exchange** (i.e. the exchange of *ghost cell data*) using three different approaches (each with and without overlap of communication and computation):
1. Using **MPI Graph communicator** and **Neighborhood collective operations** (i.e. `MPI_Dist_graph_create_adjacent` and `MPI_[N/In]eighbor_alltoallw`).
1. Using **MPI Point-to-Point** communication (i.e. `MPI_Isend`, `MPI_Irecv` and `MPI_Waitall`).
1. Using **MPI persistent communication requests** (i.e. `MPI_Send_init`, `MPI_Recv_init`, `MPI_Startall` and `MPI_Waitall`).
* How to use the **MPI subarray datatype** (i.e. `MPI_Type_create_subarray`) for I/O and halo exchange.
The code sample is structured as follows:
* `configuration.c`, `configuration.h`: Command line parsing and basic logging facilities.
* `io.c`, `io.h`: Collective I/O of the cellular automaton state.
* `main.c`: The main program
* `broadcast_configuration`: Broadcasting the parsed command line arguments.
* `create_cart_comm`: Creation of the MPI Cartesian communicator.
* `mpitypes.c`, `mpitypes.h`: Creation of MPI datatypes.
* `simulation.c`, `simulation.h`: Demonstration of 6 approaches for implementing a Wireworld cellular automaton.
* `world.c`, `world.h`: Initialization of the cellular automaton and associated MPI objects.
* `world_init`: Domain decomposition and buffer allocation.
* `world_init_io_type`: Creation of the MPI subarray datatype for I/O.
* `world_init_neighborhood`: Identification of neighboring processes, creation of MPI graph communicator.
* `world_init_persistent_requests`: Initialization of persistent requests for halo data exchange.
## Release Date
2016-10-24
## Version History
* 2016-10-24: Initial Release on PRACE CodeVault repository
## Contributors
* Thomas Ponweiser - [thomas.ponweiser@risc-software.at](mailto:thomas.ponweiser@risc-software.at)
## Copyright
This code is available under Apache License, Version 2.0 - see also the license file in the CodeVault root directory.
## Languages
This sample is entirely written in C.
## Parallelisation
This sample uses MPI-3 for parallelization.
## Level of the code sample complexity
Intermediate / Advanced
## Compiling
Follow the compilation instructions given in the main directory of the kernel samples directory (`/hpc_kernel_samples/README.md`).
## Running
Assuming that the input file `primes.wi` is in your current working directory, to run the program you may use something similar to
```
mpirun -n [nprocs] ./5_structured_wireworld_c primes
```
either on the command line or in your batch script. Note that only the input file's basename (omitting the file extension) is passed to the program.
### Command line arguments
* `-v [0-3]`: Specify the verbosity level - 0: OFF; 1: INFO (Default); 2: DEBUG; 3: TRACE
* `--sparse`: Use MPI neighborhood collective operations, a.k.a *sparse colletive operations*, i.e. `MPI_[N/In]eighbor_alltoallw` for halo data exchange (Default).
* `--p2p`: Use MPI Point-to-Point communication for halo data exchange, i.e. `MPI_Isend`, `MPI_Irecv` and `MPI_Waitall`.
* `--persist`: Use MPI persistent requets (created with `MPI_Send_init` and `MPI_Recv_init`) for halo data exchange (using `MPI_Startall` and `MPI_Waitall`).
* `--overlap`: Overlap communication and computation, i.e. compute inner cells while doing halo data exchange (Default).
* `--no-overlap`: Do not overlap communication and computation.
Thomas Ponweiser
committed
* `-x [X]`, `--nprocs-x [X]`: Use `X` processes in x-direction for MPI Cartesian communicator (optional).
* `-y [Y]`, `--nprocs-y [Y]`: Use `Y` processes in y-direction for MPI Cartesian communicator (optional).
* `-i [N]`, `--iterations [N]`: Do `N` iterations, creating `N` output files with the current state of the cellular automaton; Default: 1.
* `-g [G]`, `--generations-per-iteration [G]`: Number of generations to simulate between output iterations; Default: 5000.
Thomas Ponweiser
committed
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
For large numbers as arguments to the option `-g`, the suffixes 'k' or 'M' may be used. For example, `-g 50k` specifies 50-thousand and `-g 1M` specifies one million generations per output iteration.
### Example
If you run
```
mpirun -n 12 ./5_structured_wireworld_c -i 10 -g 50k -v 2 --nprocs-x 3 primes
```
the output should look similar to
```
Configuration:
* Verbosity level: DEBUG (2)
* Input file: primes.wi
* Transmission mode: Sparse collective - MPI_Dist_graph_create_adjacent / MPI_[N/In]eighbor_alltoallw
* Overlap mode: Overlapping communication and computation
* Grid of processes: 3 x 4
* Number of iterations: 10
* Generations per iteration: 50000
Reading 'primes.wi'...
Read header (8 characters).
Global size: 632 x 958
Creating Cartesian communicator...
INFO: MPI reordered ranks: NO
Creating MPI distributed graph communicator...
Running 10 iterations with 50000 generations per iteration.
Generation 50000 - written 'primes+000050000.wi'.
Generation 100000 - written 'primes+000100000.wi'.
Generation 150000 - written 'primes+000150000.wi'.
Generation 200000 - written 'primes+000200000.wi'.
Generation 250000 - written 'primes+000250000.wi'.
Generation 300000 - written 'primes+000300000.wi'.
Generation 350000 - written 'primes+000350000.wi'.
Generation 400000 - written 'primes+000400000.wi'.
Generation 450000 - written 'primes+000450000.wi'.
Generation 500000 - written 'primes+000500000.wi'.
Statistics:
* Generations per second: 2750
* Net simulation time (s): 181.787989
* Net I/O time (s): 0.034390
* Total time (s): 181.822473
Done.
```