CUDA only implementation of batch multiplication - with shared memory
Showing
- gemm/cuda/CMakeLists.txt 60 additions, 0 deletionsgemm/cuda/CMakeLists.txt
- gemm/cuda/README.md 47 additions, 0 deletionsgemm/cuda/README.md
- gemm/cuda/src/dev_array.h 103 additions, 0 deletionsgemm/cuda/src/dev_array.h
- gemm/cuda/src/sgemm_cuda_kernel.cu 40 additions, 0 deletionsgemm/cuda/src/sgemm_cuda_kernel.cu
- gemm/cuda/src/sgemm_cuda_kernel.h 6 additions, 0 deletionsgemm/cuda/src/sgemm_cuda_kernel.h
- gemm/cuda/src/sgemm_cuda_matrixmul.cu 109 additions, 0 deletionsgemm/cuda/src/sgemm_cuda_matrixmul.cu
Please register or sign in to comment