k-means-hadoop

k-means clustering with Apache Hadoop

Build

mvn clean install
hadoop jar target/k-means.jar \
            PATH_INPUT \
            PATH_OUTPUT \
            NUM_K \
            NUM_COORDINATES \
            THRESHOLD

Example

hadoop jar target/k-means.jar \
            input \
            output \
            7 \
            3 \
            0.0005

Usage

Generate the points (x;y , x;y;z etc.) for the data_set with the python script generatePoints.py (parameters, numPoints, range).

python generatePoints.py 3 5 50 # 5 3d points in -50..50 range

Output in input/data_set:

21.6041331095;8.46492932874;39.6766968839;
11.9148005581;47.8849166781;17.8483647205;
19.9629853313;2.6589522782;20.4473549181;
44.7893224782;39.1567505862;39.7058609459;
26.9080526686;24.6560481195;32.782580723;

Import as a Maven project The main class is KMeans.java and the args are:

input folder
output folder
number of clusters K
number of parameters
threshold

Use plot2d.py or plot3d.py to plot the results

Example with k = 7, 1000 points

python generatePoints.py 2 1000 300

Example with k = 5, 1000 points

python generatePoints.py 3 1000 100

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
img		img
mameli		mameli
relazione		relazione
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generatePoints.py		generatePoints.py
plot2d.py		plot2d.py
plot3d.py		plot3d.py
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k-means-hadoop

Build

Usage

About

Releases

Packages

Languages

License

mameli/k-means-hadoop

Folders and files

Latest commit

History

Repository files navigation

k-means-hadoop

Build

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages