J. Handl: data generators

Syntetic data generators for creating datasets with Gaussian distribution. The code was taken from official website and slightly modified for modern compilers.

Requirements

You'll need g++ compiler.

Debian: apt-get install build-essential should be enough

and run

$ make

mult_generator

Currently does not take any parameters, all settings is hard-coded in constants:

#define DIM 2        // dimensionality of the data
#define NUM 40      // number of clusters

#define MAXMU 10    // mean in each dimension is in range [0,MAXMU]
#define MINMU -10
#define MINSIGMA 0
#define MAXSIGMA 20*sqrt(DIM) // standard deviation (to be added on top
// of row sum in each dimension is in range [0,MAXSIGMA]
#define MAXSIZE 100  // size of each cluster is in range [MINSIZE,MAXSIZE]
#define MINSIZE 10
#define RUNS 10      // number of data sets to be generated

simply run:

$ ./mult_generator

elly

Ellipsoid generator

$ ./elly [-k <nclust>] [-d <dimension>] [-s <seed>]

where all parameters are optional and:

<nclust> is a positive int >= 2
<dimension> is a positive int >= 2
<seed> is a long int.

cure

CURE data sets generator. See Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. "CURE: an efficient clustering algorithm for large databases." ACM SIGMOD Record. Vol. 27. No. 2. ACM, 1998. for more details.

The distribution of data points is just approximated

$ ./cure -n <npoints> [-d <dimension>] [-s <seed>] [-l <x/y min>] [-m <x/y max>] [-t type of data]

where:

-l minimal x/y value
-m maximal x/y value
-t type of dataset, currently supports values 0-2

disk

Disk in disk dataset are two clusters formed by a circle and an annulus around it.

Authors

Julia Handl
Joshua Knowles
John Burkardt
Tomas Barton

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
gen.sh		gen.sh
impos		impos
plot.gpt		plot.gpt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

J. Handl: data generators

Requirements

mult_generator

elly

cure

disk

Authors

About

Releases

Packages

Languages

License

deric/handl-data-generators

Folders and files

Latest commit

History

Repository files navigation

J. Handl: data generators

Requirements

mult_generator

elly

cure

disk

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages