Skip to content

Latest commit

 

History

History
31 lines (19 loc) · 1.38 KB

README.md

File metadata and controls

31 lines (19 loc) · 1.38 KB

Clustering

Introduction

This repository contains numerous clustering methods, including hierarchical clustering, CURE (Clustering Using REpresentatives), k-means and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Clustering is a useful technique in data mining and statistical data analysis used to group similar data together and identify patterns in distributions.

The clustering algorithms above have been separated into two scripts - k-means can be found in kmeans.q, while the other algorithms can be found in clust.q. Additionally, example notebooks have been provided to show how the algorithms perform on a variety of datasets.

A k-dimensional tree (k-d tree) is used by the single and centroid hierarchical algorithms, as well as for CURE which can use both q and C implementations of the k-d tree.

Requirements

  • embedPy

The python packages required to allow successful exectution of all functions within the machine learning toolkit can be installed via:

pip:

pip install -r requirements.txt

or via conda:

conda install --file requirements.txt

Running of the notebook examples will require the installation of JupyterQ however this is not a dependancy for the running of functions at an individual level.

Status

The clustering library is still in development, further improvements will be made to the library in the coming months.