Commit d12ee7aa authored by Valeriu Codreanu's avatar Valeriu Codreanu
Browse files

added README files

parent 0f32c8de
=======
README
=======
# 1. Code sample name
gemm
# 2. Description of the code sample package
This example demonstrates the use of NVIDIA's linear algebra library for CUDA: cuBLAS. The example is set-up to perform the computation of both CPU and GPU and in the end to verify the results.
Additional pre-requisites:
* CUDA (includes the cuBLAS library)
* clBLAS
See http://docs.nvidia.com/cuda/cublas for the full cuBLAS documentation.
See https://github.com/clMathLibraries/clBLAS for the clBLAS library
# 3. Release date
25 July 2015
# 4. Version history
1.0
# 5. Contributor (s) / Maintainer(s)
Valeriu Codreanu <valeriu.codreanu@surfsara.nl>
# 6. Copyright / License of the code sample
Apache 2.0
# 7. Language(s)
C++
CUDA
# 8. Parallelisation Implementation(s)
GPU
# 9. Level of the code sample complexity
Basic level, uses library calls only
# 10. Instructions on how to compile the code
Uses the CodeVault CMake infrastructure, see main README.md
# 11. Instructions on how to run the code
Run the executable with a single command-line option, the matrix size
# 12. Sample input(s)
Input-data is generated automatically when running the program.
# 13. Sample output(s)
Output data is verified programmatically using a CPU implementation of GEMM.
=======
README
=======
# 1. Code sample name
kmeans
# 2. Description of the code sample package
Note: This application was ported from the Rodinia Suite
(https://www.cs.virginia.edu/~skadron/wiki/rodinia/).
K-means is a clustering algorithm used extensively in data-mining and elsewhere, important primarily for its simplicity. Many data-mining algorithms show a high degree of data parallelism.
In k-means, a data object is comprised of several values, called features. By dividing a cluster of data objects into K sub-clusters, k-means represents all the data objects by the mean values or centroids of their respective sub-clusters. The initial cluster center for each sub-cluster is randomly chosen or derived from some heuristic. In each iteration, the algorithm associates each data object with its nearest center, based on some chosen distance metric. The new centroids are calculated by taking the mean of all the data objects within each sub-cluster respectively. The algorithm iterates until no data objects move from one sub-cluster to another.
This set of examples demonstrates the use of:
* OpenCL
* NVIDIA CUDA
Additional pre-requisites:
* CUDA
* OpenCL
# 3. Release date
30 July 2015
# 4. Version history
1.0
# 5. Contributor (s) / Maintainer(s)
Valeriu Codreanu <valeriu.codreanu@surfsara.nl>
# 6. Copyright / License of the code sample
Apache 2.0
# 7. Language(s)
C++
CUDA
OpenCL
# 8. Parallelisation Implementation(s)
GPU
CPU
# 9. Level of the code sample complexity
Source-code example demonstrating the use of CUDA and OpenCL
# 10. Instructions on how to compile the code
Uses the CodeVault CMake infrastructure, see main README.md
# 11. Instructions on how to run the code
Check the kmeans_cuda and kmeans_rodinia_opencl and kmeans_openmp folders for instructions on how to run the application
# 12. Sample input(s)
Input-data is included in the kmeans_data folder
# 13. Sample output(s)
Output cluster center coordinated
\ No newline at end of file
=======
README
=======
# 1. Code sample name
lud
# 2. Description of the code sample package
This set of examples demonstrates the use of:
* NVIDIA's linear algebra library for CUDA: cuBLAS.
* NVIDIA's solver library for CUDA: cuSOLVER.
* Intel's Math Kernel Library: Intel MKL.
Some examples (cublas_mkl, cusolver_mkl) are set-up to perform the computation of both CPU and GPU and in the end to verify the results.
Additional pre-requisites:
* CUDA (includes the cuBLAS and cuSOLVER libraries)
* Intel MKL
# 3. Release date
30 July 2015
# 4. Version history
1.0
# 5. Contributor (s) / Maintainer(s)
Valeriu Codreanu <valeriu.codreanu@surfsara.nl>
# 6. Copyright / License of the code sample
Apache 2.0
# 7. Language(s)
C++
CUDA
# 8. Parallelisation Implementation(s)
GPU
CPU
# 9. Level of the code sample complexity
Basic level, uses library calls only
# 10. Instructions on how to compile the code
Uses the CodeVault CMake infrastructure, see main README.md
# 11. Instructions on how to run the code
Run the executable with a single command-line option, the matrix size
# 12. Sample input(s)
Input-data is generated automatically when running the program.
# 13. Sample output(s)
Output data is verified programmatically using a CPU implementation of LUD. Performance numbers are outputted as well.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment