Skip to content
Commits on Source (8)
## Prerequisites Installation
The prerequsities consists of a list of python packages as shown below. It is recommended to create a python virtual environment (either with `pyenv` or `conda`). The following packages can be installed using the `pip` package management tool:
```
pip install tensorflow
pip install horovod
pip install scikit-learn
pip install scikit-image
pip install pandas
```
Note: there is no guarantee of optimal performance when `tensorflow` is installed using `pip`. It is better if `tensorflow` is compiled from source, in which case the compiler will likely be able to take advantage of the advanced instruction sets supported by the processor (e.g., AVX512). An official build instruction can be found at https://www.tensorflow.org/install/source. Sometimes, an HPC center may have a tensorflow module optimized for their hardware, in which case the `pip install tensorflow` line can be replaced with a line like `module load <name_of_the_tensorflow_module>`.
batch_medium.slurm
efn_b4.h5
env_bench
model_hvd_bw_512_B4_with_noise_n_p_4.h5
output_bw_512.hdf5
results-DG-medium/
train_log.txt
Medium test case presentation
-----------------------------
This test case performs a training using 512X512 images, with 3 positions per image, as input.
Reference time on Jean-zay with 4 nodes, 16 MPI proces, 16 GPUs, 3 positions and 100 epochs:
* For 100epochs: ~67ms/sample and 32min30s as time to solution
#!/bin/bash
if [ -z "$1" ]
then
echo "Please provide the targeted machine from:"
ls ../machines/
echo ""
echo "Example: ./prepare.sh jeanzay-gpu"
exit 1
fi
machine_dir="../machines/$1"
cp $machine_dir/env_bench .
cp $machine_dir/batch_medium.slurm .
ln -s ../DeepGalaxy-master/output_bw_512.hdf5 .
#!/bin/bash
sbatch batch_medium.slurm
#!/bin/bash
set -e
RESULT_DIR=results-DG-medium
mkdir -p $RESULT_DIR
cp dg.err dg.out $RESULT_DIR
grep "Epoch" -A1 dg.out > $RESULT_DIR/epochs.results
grep "Epoch 100/100" -A1 dg.out > $RESULT_DIR/last_epoch.results