Update README.md

2b7b663c · Maxwell Cai · 6646bddf · 2b7b663c
Commit 2b7b663c authored 3 years ago by Maxwell Cai
--- a/tensorflow/README.md
+++ b/tensorflow/README.md
@@ -48,9 +48,20 @@ In the `DeepGalaxy` directory, download the training dataset. Depending on the b

 **Step 3**: Run the code on different numbers of workers. For example, the following command executes the code on `np = 4` workers:
 ```
-mpirun -np 4 python dg_train.py -f output_bw_512.hdf5 --epochs 20 --noise 0.1 --batch-size 4 --arch EfficientNetB4
+export OMP_NUM_THREADS=<number_of_cores_per_sockets>
+ HOROVOD_FUSION_THRESHOLD=134217728 \
+ mpirun --np <number_of_mpi_workers> \
+ --map-by ppr:1:socket:pe=$OMP_NUM_THREADS \
+ --report-bindings \
+ --oversubscribe \
+ -x LD_LIBRARY_PATH \
+ -x HOROVOD_FUSION_THRESHOLD \
+ -x OMP_NUM_THREADS=$OMP_NUM_THREADS \
+ python dg_train.py -f output_bw_512.hdf5 --num-camera 3 --arch EfficientNetB4 \
+--epochs 5 --batch-size <batch_size>
+
 ```
-`output_bw_512.hdf5` is the training dataset downloaded in the previous step. Please change the file name if necessary. One could also change the other parameters, such as `--epochs`, `--batch-size`, and `--arch` according to the size of the benchmark. For example, the `EfficientNetB0` deep neural network is for small HPC systems, `EfficientNetB4` is for medium-size ones, and `EfficientNetB7` is for large systems. Also, shoudl the system memory permits, increasing the `--batch-size` could improve the throughput. If the `--batch-size` parameter is too large, an out-of-memory error could occur.
+The placeholders `number_of_cores_per_sockets` and `number_of_mpi_workers` should be replaced by the number of CPU cores in a CPU socket and the number of copies of the neural network is trained in parallel. For example, if a simulation is running on 4 nodes, each of which with two CPU sockets, and each CPU has 64 cores, then `number_of_cores_per_sockets = 64` and `number_of_mpi_workers = 8` (4 nodes, 2 MPI workers per node). The `batch_size` parameter is specific to machine learning rather than HPC, but users should choose a proper batch size to make sure that the hardware resources are fully utilised but not overloaded. `output_bw_512.hdf5` is the training dataset downloaded in the previous step. Please change the file name if necessary. One could also change the other parameters, such as `--epochs`, `--batch-size`, and `--arch` according to the size of the benchmark. For example, the `EfficientNetB0` deep neural network is for small HPC systems, `EfficientNetB4` is for medium-size ones, and `EfficientNetB7` is for large systems. Also, shoudl the system memory permits, increasing the `--batch-size` could improve the throughput. If the `--batch-size` parameter is too large, an out-of-memory error could occur.

 The benchmark data of the training are written to the file `train_log.txt`.