Single GPU CUDA Version(N=1000, M=1000): t= 0.092299 ms Single GPU CUDA Version(N=10000, M=10000): t= 4.060280 ms