# A single GPU impementation of the Matrix-Vector algorithm with:

```
->cuBLAS(BLAS routines implemented on the GPU by NVIDIA)
07/09/2017: Completed
13/09/2017: Modified to use unified memory

->cuBLAS_MultiGPU(cuBLAS implementation in multiple GPUs/Nodes)
26/09/2017: Completed

->cuda_SingleGPU(3 cuda kernels showing the optimization steps in writing GPU code)
02/10/2017: Completed kernel 1
03/10/2017: Completed kernel 2 & 3
```