-
Notifications
You must be signed in to change notification settings - Fork 0
mdgoldberg/clustering
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This was my final project for CS51. I worked in a group with two friends in the course; I implemented all of the Markov clustering algorithm (found in markov.ml), parts of the matrix module (found in matrix.ml), and all of the main script (found in main.ml). We wrote the signatures.ml file together, which provides signatures for the modules we used. INSTALLATION: First things first: enter the code directory and type "make". This will compile our .ml files (this Makefile uses the corebuild tool to compile OCaml files). To run the program, just use ./main.native with all necessary options (see below). We have provided three simple, example csv files, called example1.csv, example2.csv, and example3.csv. The following commands will run our algorithms on these files: ./main.native ../data/example1.csv --has-labels int kruskal|markov ./main.native ../data/example2.csv --has-labels --matrix int kruskal|markov ./main.native ../data/example3.csv --has-labels int kruskal|markov where kruskal|markov means either type kruskal or type markov (NOT BOTH), depending on which you want to run. Any of the above commands can also be given additional options, such as --arg1, --arg2, and --verbose (see below). USAGE: ./main.native csvfile float|int kruskal|markov [options] VALID OPTIONS: --matrix This means that the csv file already represents a matrix of distances between nodes. If this flag isn't set, then we interpret the file as a list of points in R^n, which we then translate into a matrix of distances. --arg1=A If this flag is set, then the first argument to the clustering algorithm will be A. For Kruskal, this means it will cluster the data into A clusters. For Markov, this means the expansion parameter will be A. If the flag is absent, it will default to 2 for Markov and N/2 for Kruskal (where N is the number of nodes). --arg2=B If this flag is set, then the second argument to the clustering algorithm will be B. This is meaningless for Kruskal, which only takes one argument, so Kruskal's algorithm will ignore this flag. --has-labels If this flag is set, then the csv file has labels in its first column. We take this into account by skipping the first column during calculation, and then using these labels as names for each node when we output the results. NOTE: LABELS CANNOT HAVE THE DELIMITER CHARACTER IN THEM. --has-header If this flag is set, then the csv file has category headers in its first row. We take this into account by skipping the first row in our calculations. --verbose If this flag is set, then the Markov algorithm will print out the current matrix on each iteration. This flag has no implications for Kruskal's algorithm. --delim='C' If this flag is set, then the csv file is separated by C instead of by commas. For example, if a csv used semicolons to separate its data, then you would use the option --delim=';' so that we can take this into account.
About
Graph Clustering in OCaml - Markov- and Kruskal-based clustering algorithms for general graphs
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published