Matrix multiplications can be done in different ways which vary in performance. This example should point out how massive these performance differences really are.
This code sample demonstrates:
* Different implementations to do a matrix multiplication
* Serial
* IJK Algorithm
* IKJ Algorithm
* Parallel
* OpenMP parallelized IKJ Algorithm
* Intel MKL BLAS Library
* Benchmark implementations with different matrix dimensions.
Benchmark design:
The program runs the multiplication with matrix dimensions from 2-2048 with single precision floats. For timing accuracy multiple repetitions are used and the average time per multiplication is calculated. The program output is a CSV-Text with dimension size and time per multiplication in seconds.
## Release Date
2016-10-24
## Version History
* 2016-10-24: Initial Release on PRACE CodeVault repository
## Contributors
* Thomas Steinreiter - [thomas.steinreiter@risc-software.at](mailto:thomas.steinreiter@risc-software.at)
## Copyright
This code is available under Apache License, Version 2.0 - see also the license file in the CodeVault root directory.
## Languages
This sample is entirely written in C++14.
## Parallelisation
This sample uses MPI-3 for parallelization.
## Level of the code sample complexity
Intermediate
## Compiling
Follow the compilation instructions given in the main directory of the kernel samples directory (`/hpc_kernel_samples/README.md`).
## Running
In your current working directory, to run the program you may use something similar to
```
1_dense_mklblas blas
```
either on the command line or in your batch script.
### Command line arguments
*`<mode>`: Specity the implementation used for the multiplication (obligatory)
*`serialijk` Use the serial IJK Algorithm
*`serialikj` Use the serial IKJ Algorithm
*`parallel` Use the parallel (OpenMp) IKJ Algorithm