Clustering

Introduction

This repository contains numerous clustering methods, including hierarchical clustering, CURE (Clustering Using REpresentatives), k-means and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Clustering is a useful technique in data mining and statistical data analysis used to group similar data together and identify patterns in distributions.

The clustering algorithms above have been separated into two scripts - k-means can be found in kmeans.q, while the other algorithms can be found in clust.q. Additionally, example notebooks have been provided to show how the algorithms perform on a variety of datasets.

A k-dimensional tree (k-d tree) is used by the single and centroid hierarchical algorithms, as well as for CURE which can use both q and C implementations of the k-d tree.

Requirements

embedPy

The python packages required to allow successful exectution of all functions within the machine learning toolkit can be installed via:

pip:

pip install -r requirements.txt

or via conda:

conda install --file requirements.txt

Running of the notebook examples will require the installation of JupyterQ however this is not a dependancy for the running of functions at an individual level.

Status

The clustering library is still in development, further improvements will be made to the library in the coming months.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Clustering

Introduction

Requirements

Status

Files

README.md

Latest commit

History

README.md

File metadata and controls

Clustering

Introduction

Requirements

Status