# A single GPU impementation of the Matrix-Vector algorithm with: ``` ->cuBLAS(BLAS routines implemented on the GPU by NVIDIA) 07/09/2017: Completed 13/09/2017: Modified to use unified memory ->cuBLAS_MultiGPU(cuBLAS implementation in multiple GPUs/Nodes) 26/09/2017: Completed ->cuda_SingleGPU(3 cuda kernels showing the optimization steps in writing GPU code) 02/10/2017: Completed kernel 1 03/10/2017: Completed kernel 2 & 3 Tested environments: - Haswell Intel Xeon E5-2660v3 CPU with Linux x86_64 + Nvidia Tesla K40 GPUs and cuda/8.0.61 ```