Single GPU CUDA Version(N=25000, M=25000): t= 26.747921 ms