Single GPU CUDA Version(N=10000, M=10000): t= 4.058859 ms