-
Notifications
You must be signed in to change notification settings - Fork 5
canonizer/gpumafia
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
INTRO gpumafia implements MAFIA clustering algorithm, mostly as described in the paper "Parallel Algorithms for Clustering High-Dimensional Large-Scale Datasets" by H. Nagesh, S. Goil and A. Choudhary. Refer to this paper for the detailed description of the algorithm as well as the parameters involved. This implementation can run both on CPUs (single-core only) as well as GPUs (only a single GPU). The implementation mostly follows the paper, while introducing some additional algorithmic optimizations. This code has been developed at Juelich Supercomputing Center, Forschungszentrum Juelich, and is a property of Forschungszentrum Juelich. Users can modify and/or distribute code free of charge, provided that this notice is retained. The implementation expects a single file with data points, and writes output files, one or two per cluster, in the same directory where the input file is located. Indices of points belonging to the cluster, and optionally, the points themselves, are the output of the program. COMPILING The main project directory is cppmafia; all further commands are given with respect to that directory. ./makefile is the makefile for the project, and make definitions are in makefile.def. There are currently two definitions: the C++ compiler used (CXX), and where the program will be installed (PREFIX). By default, NVidia Compiler (nvcc) is used, though this can be changed to GCC (g++) or Intel Compiler (icc). GPU support is only enabled when nvcc is used. To compile the project, execute the following command in the project directory: make To install the cppmafia program: make install To uninstall the program: make uninstall USING To run the program, just execute the following command: cppmafia <point-file> The point file usually has .dat extension. Files containing cluster data will be created in the directory of <point-file>, one file with point indices (.idx) per cluster. The entire options of the program can be obtained with the following command: cppmafia -h or cppmafia --help Some of the more useful options are (GNU option syntax supported): -p,--output-points - write out point data in addition to point indices. However, this may considerably increase I/O time -d,--device - use GPUs; the program will fail if no CUDA GPUs are available -V,--verbose - output results of algorithm's intermediate steps Options for specifying algorithm parameters (GNU option syntax supported): -a,--alpha <alpha> - dense window threshold (alpha), default 1.5 -b,--beta <beta> - window merge threshold (beta), default 0.25 -n,--bins <nbins> - minimum number of bins per dimension, default 1000 -u,--min-wins <nwindows> - number of windows for uniform dimensions, default 5 -M,--max-wins <nwindows> - maximum number of windows per dimension, default 20 And though the case when number of bins is not divisible by the number of windows (either -u or -M) should work, it has not been tested. So it better be divisible. INPUT DATA If you want to use MAFIA for something serious, you probably have them already. However, if you want just to play with the utility, profile it or test it for correctness and/or performance, there is a data generator provided. See utils/clugen for the data generator. You can compile it by doing make (in that directory), and then generate different datasets with it. E.g., bin/clugen -n100000 -d10 -k8 -m2 ~/try/mafia/cluster-8.dat will generate a 100,000-point 10-dimensional dataset, with 2 8-dimensional clusters, roughly 45,000 points each. The details of each cluster would be printed to stdout of clugen. When that file is input into MAFIA, it should then find the same or similar clusters. Generally, MAFIA should find one cluster with default parameters robustly; with more clusters, robustness decreases. To generate clusters which do not intersect in any dimensions (which is somewhat unrealistic), use clugen's -N switch. CODE options.{h,cpp} contain code for defining and parsing options. timing.{h,cpp} contain helper functions for collecting timing information. ref.h is the implementation of a smart pointer to a reference-counted object. mafia-io.{h,cpp} contains functions for reading point files and writing out the resulting clusters to the file system. utils.{h,cpp} contain some utility code; this is currently limited to some macros, and bulk_{alloc,free} functions, which are used to allocate big data structures. bulk* functions work with CUDA pinned host memory when the device is used; otherwise, they just call malloc(). malloc() is still used directly or through STL for smaller memory allocations. main.cpp is some glue code which parses the command line, reads point data, runs the MAFIA algorithm, and writes out resulting clusters. The code of the MAFIA algorithm is mostly concentrated in mafia-solver.{h,cpp}, which contain host code, and mafia-solver-device.cu, which contains device code. (C)DUs and adaptive windows are moved to separate classes; they are contained in cdu.{h,cpp} and window.{h,cpp}, repsectively. MafiaSolver<T> is the class implementing MAFIA algorithm. It is parameterized with point coordinate type (float or double), double is used by default. The method that implements MAFIA algorithm is solve(). The algorithm starts with building the histograms, build_histos(), which can be done on device. build_windows() then builds windows, always on host. Building windows starts from uniform windows, generated by build_uniform_windows(). Depending on their point counts, uniform windows can be merged into adaptive windows. Finally, build_bitmaps() builds the bitmaps, for which the device can be used. Bitmaps are used for counting points, and this is the modification of the original algorithm. Each bitmap stores information on points belonging to a particular dense adaptive window, with only 1 bit used per point. This saves both memory bandwidth and arithmetic operations, compared to direct counting. After building histograms, windows and bitmaps, the algorithm enters the main loop, which continues until candidate dense units (CDUs) can no longer be produced. The algorithm starts with CDUs of dimension one, and increases CDU dimensionality at each iteration by 1. Each iteration consists of generating CDUs (find_cdus()), checking which of them are dense (find_dus()), and finally checking for any unassimilated DUs left (find_unjoined_dus()). Unassimilated DUs are added to the list of terminal DUs, which is used to construct the final clusters. To find CDUs (find_cdus()), the algorithm takes all pairs of DUs of dimensionality a-1 and checks if they can be merged into a CDU of dimensionality a. It is so if they contain a common subsequence of dimensionality a-2; the check is performed in Cdu::can_merge_with() function in cdu.cpp. Any successfully generated CDUs are added to a set, which also prevents generating duplicate CDUs several times. For dimension 1, all dense windows are considered CDUs. To check if newly found CDUs are dense (find_dus()), the number of points inside each CDU needs to be counted, and then compared with the thresholds of windows. To do this, and operation is performed on bitmaps of CDUs windows, and the number of bits is counted in the resulting bitmap, which gives the number of points. If the device (GPU) is used, this operation is performed on the device; this gives considerable (up to 50x) acceleration compared to doing this on host. Finally, unassimilated DUs from previous iteration need to be moved to the list of terimal DUs (find_unjoined_dus()). For this, each DU in previous dimension is checked against the newly found DUs to see whether the old DU has been assimilated. The assimilation check is performed in Cdu::is_assimilated_into() method. After there are no more CDUs, the algorithm proceeds to building the final clusters. To achieve this, all terminal DUs are added to a graph (build_du_graph()), where the DUs are the nodes, and a (bidirectional) edge exists between two DUs of the same dimensionality only if they share a common face. The connected components of the graph are then found, by performing breadth-first search repeatedly until no unvisited nodes are left (build_du_clusters()). Each connected component is then said to constitute a single cluster. Finally, for each cluster the array of indices of points belonging to it is built (build_clusters()).
About
Implementation of MAFIA Subspace Clustering on NVidia GPUs
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published