Single GPU CUDA Version(N=10000, M=10000): t= 25.496180 ms Single GPU CUDA Coalesced Version(N=10000, M=10000): t= 6.006391 ms Single GPU CUDA shmem Version(N=10000, M=10000): t= 4.535849 ms Single GPU CUDA Version(N=25000, M=25000): t= 1775.153432 ms Single GPU CUDA Coalesced Version(N=25000, M=25000): t= 31.531191 ms Single GPU CUDA shmem Version(N=25000, M=25000): t= 26.186922 ms