A Julia package for Support Vector Data Description.
This package implements one-class classifiers and based on support vector data description. The package has been developed as part of a benchmark suite for active-learning strategies for one-class classification. For more information about this research project, see the OCAL project website, and the companion paper.
Holger Trittenbach, Adrian Englhardt, Klemens Böhm, "An overview and a benchmark of active learning for outlier detection with one-class classifiers", DOI: 10.1016/j.eswa.2020.114372, Expert Systems with Applications, 2021.
This package works with Julia 1.0 or newer. This package is not registered yet. Please use the following command to add the package with Pkg3.
using Pkg
Pkg.add("https://github.com/englhardt/SVDD.jl.git")
One-class classifiers learn to identify if objects belong to a specific class, often used for outlier detection. The package implements several one-class classifiers, and strategies to initialize parameters. We other visualizations in our example notebooks, see Examples
Currently, the classifiers have been implemented as optimization problems based on JuMP. The package includes:
- Vanilla Support Vector Data Description (VanillaSVDD) [1]
- SVDD with negative examples (SVDDNeg) [1]
- Semi-supervised Anomaly Detection (SSAD) [2]
- Subspace SVDD (SubSVDD) [3]
There are two types of parameters to estimate for the classifiers: cost parameters and a kernel function. The packages includes the following strategies to initialize parameters.
- Gauss Kernel gamma
- Rule of Scott [4]
- Rule of Silverman [5]
- Mean criterion [6]
- Modified mean criterion [7]
- Wang data shifting [8]
- Fixed Gamma
- Cost parameters C
- Rule of Tax [1]
- Binary Search
- Fixed C
- Classification scores: The classifiers return scores by the following convention:
- score > 0 for outliers
- score <= 0 for inliers
- Data Format: The data is expected to be in column major order, i.e., first array dimension is the attribute, second is the observation.
[1 2 3 4; 5 6 7 8]
is a 2x4 Array with 2 attributes and 4 observations
There are two notebooks that show to train a SVDD (here) and how to use the parametrization methods (here). Execute the following commands to run the example notebooks:
git clone https://github.com/englhardt/SVDD.jl
cd SVDD/examples
julia -e "using Pkg; Pkg.instantiate()"
julia -e "using IJulia; notebook()"
You can then access the jupyter notebook server at http://localhost:8888/ and run the notebooks.
We welcome contributions and bug reports.
This package is developed and maintained by Holger Trittenbach and Adrian Englhardt.
[1] Tax, David MJ, and Robert PW Duin. "Support vector data description." Machine learning 54.1 (2004): 45-66.
[2] Görnitz, Nico, et al. "Toward supervised anomaly detection." Journal of Artificial Intelligence Research 46 (2013): 235-262.
[4] Scott, David W. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015.
[3] Trittenbach, Holger, and Klemens Böhm. "One-Class Active Learning for Outlier Detection with Multiple Subspaces." ACM International Conference on Information and Knowledge Management (CIKM), 2019.
[5] Silverman, Bernard W. Density estimation for statistics and data analysis. Routledge, 2018.
[6] Chaudhuri, Arin, et al. "The mean and median criteria for kernel bandwidth selection for support vector data description." 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2017.
[7] Liao, Yuwei, et al. "A new bandwidth selection criterion for using SVDD to analyze hyperspectral data." Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXIV. Vol. 10644. International Society for Optics and Photonics, 2018.
[8] Wang, Siqi, et al. "Hyperparameter selection of one-class support vector machine by self-adaptive data shifting." Pattern Recognition 74 (2018): 198-211.