Commit d26bc8dd authored by Nikos Nikoloutsakos's avatar Nikos Nikoloutsakos
Browse files

pddp2means: README

 Changes to be committed:
	new file:
parent 52b3d1d3
*cr University of Patras, Greece
*cr Copyright (c) 2015 University of Patras
*cr All rights reserved
*cr Developed by: HPClab
*cr Computer Engineering and Informatics Department
*cr University of Patras
#./pddp_2means <input_file> <output_file> <clusters>
./pddp_2means ../data/40k.csv pddp_2means.out 19
# 1. Code sample name
# 2. Description of the code sample package
Clustering is the task of grouping a set of objects in such way
that objects assigned in the same group exhibit greater similarity
than those located, according to some computable criterion of
similarity. Our approach on designing a parallel clustering algorithm optimized for the Intel Xeon-Phi
Our implementation is based on the Principal Direction Divisive Partitioning (PDDP) 2-means algorithm.
The steps of the implementation include:
1. Create a binary tree.
2. Find a splittable leaf with the greatest scatter value to use pddp 2 means algorithm on.
3. Find an approximation of the dominant eigenvector v of the matrix leaf(data)–leaf(centroid) using the power iteration algorithm.
4. Use the values of vector v to initialize the 2 means algorithm by creating a first set of clusters.
5. Use the result of 2-means to split the cluster of the leaf parent into two new clusters one for each leaf child.
6. Repeat steps 2. to 5. until you have the amount of clusters wanted.
This algorithm provided very stable and accurate solutions for the clustering problem (in terms of the Dunn Index metric).
Additional pre-requisites:
* Intel Compiles Suite
* Intel Xeon Phi
# 3. Release date
25 January 2015
# 4. Version history
# 5. Contributor (s) / Maintainer(s)
Nikos Nikoloutsakos <>
# 6. Copyright / License of the code sample
Apache 2.0
# 7. Language(s)
# 8. Parallelisation Implementation(s)
Intel Xeon Phi - Offload Mode
# 9. Level of the code sample complexity
Sample data example demonstrating the use of pddp-2means clustering in a small data set.
# 10. Instructions on how to compile the code
Use the Makefile included in the src/ directory.
# 11. Instructions on how to run the code
./pddp_2means <input_file> <output_file> <clusters>
# 12. Sample input(s)
Input-data is included in the data/ folder
# 13. Sample output(s)
Output the membership of the data point to the cluster.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment