Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add support filtering the data by one categorical variable #270

Merged
merged 3 commits into from
Oct 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ apply plugin: 'idea'
apply plugin: 'elasticsearch.esplugin'
apply plugin: 'base'
apply plugin: 'jacoco'
apply plugin: 'eclipse'

allprojects {
group = 'com.amazon.opendistroforelasticsearch'
Expand Down Expand Up @@ -256,7 +257,15 @@ List<String> jacocoExclusions = [
'com.amazon.opendistroforelasticsearch.ad.transport.SearchAnomalyDetectorTransportAction*',
'com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction*',
'com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorResponse',
'com.amazon.opendistroforelasticsearch.ad.transport.IndexAnomalyDetectorRequest'
'com.amazon.opendistroforelasticsearch.ad.transport.IndexAnomalyDetectorRequest',
'com.amazon.opendistroforelasticsearch.ad.transport.SearchAnomalyResultTransportAction*',

// TODO: hc caused coverage to drop
//'com.amazon.opendistroforelasticsearch.ad.ml.ModelManager',
'com.amazon.opendistroforelasticsearch.ad.transport.AnomalyResultTransportAction',
'com.amazon.opendistroforelasticsearch.ad.transport.AnomalyResultTransportAction.EntityResultListener',
'com.amazon.opendistroforelasticsearch.ad.NodeStateManager',
'com.amazon.opendistroforelasticsearch.ad.transport.handler.MultiEntityResultHandler',
]

jacocoTestCoverageVerification {
Expand Down Expand Up @@ -301,7 +310,7 @@ dependencies {
compileOnly "com.amazon.opendistroforelasticsearch:opendistro-job-scheduler-spi:1.10.1.1"
// Will be moved to Maven Depedency when https://github.com/opendistro-for-elasticsearch/common-utils repo publishes a release
compile files('libs/common-utils-1.10.1.0.jar')
compile group: 'com.google.guava', name: 'guava', version:'15.0'
compile group: 'com.google.guava', name: 'guava', version:'29.0-jre'
compile group: 'org.apache.commons', name: 'commons-math3', version: '3.6.1'
compile group: 'com.google.code.gson', name: 'gson', version: '2.8.5'
compile group: 'com.yahoo.datasketches', name: 'sketches-core', version: '0.13.4'
Expand All @@ -311,6 +320,7 @@ dependencies {
compile 'software.amazon.randomcutforest:randomcutforest-serialization-json:1.0'
compile "org.elasticsearch.client:elasticsearch-rest-client:${es_version}"


compile "org.jacoco:org.jacoco.agent:0.8.5"
compile ("org.jacoco:org.jacoco.ant:0.8.5") {
exclude group: 'org.ow2.asm', module: 'asm-commons'
Expand Down
27 changes: 27 additions & 0 deletions docs/multi-entity-rfc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# High Cardinaltiy support in Anomaly Detection RFC

The purpose of this request for comments (RFC) is to introduce our plan to enhance Anamaly Detection for OpenDistro by adding the support of high cardinality. This RFC is meant to cover the high level functionality of the high cardinality support and doesn’t go into implementation details and architecture.

## Problem Statement

Currently the Anomaly Detection for Elasticsearch for OpenDistro only support single entity use case. (e.g. average of cpu usage across all hosts, instead of cpu usage of individual hosts). For multi entity cases, currently users have to create individual detectors for each entity manually. It is very time consuming, and could simply become infeasible when the number of entities reach to hundreds or thousands (high cardinality).

## Proposed solution

We propose to create a new type of detector to support multi entity use case. With this feature, users only need to create one single detector to cover all entities that can be categorized by one or multiple fields. They will also be able to view the results of the anomaly detection in one unified report.

### Create Detector

Most of the detector creation workflow is similar to the single entity detectors, the only additional input is a categorical field, e.g. ip_address, which will be used to split data into multiple entities. We’ll start with supporting only one categorical fields. We’ll add support of multiple categorical fields in future releases.

### Anomaly Report

The output of multi entity detector will be categorized by entities. The entities with most anomalies detected will be presented in a heatmap plot. Users then have the option to click into each entity for more details about the anomalies.

### Entity capacity

Supporting high cardinality with multiple entities definitely takes more resource than single entity detectors. The total number of supported unique entities depends on the cluster configuration. We'll provide a table with the launch to show the recommended number of entities for certain cluster configurations. In general we are planning to support up to 10K entities in the initial release.

## Providing Feedback

If you have comments or feedback on our plans for Multi Entity support for Anomaly Detection, please comment on the [original GitHub issue](https://github.com/opendistro-for-elasticsearch/anomaly-detection/issues/xxx) in this project to discuss.
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,14 @@ private void indexAnomalyResult(
String detectorId = jobParameter.getName();
detectorEndRunExceptionCount.remove(detectorId);
try {
// skipping writing to the result index if not necessary
// For a single-entity detector, the result is not useful if error is null
// and rcf score (thus anomaly grade/confidence) is null.
// For a multi-entity detector, we don't need to save on the detector level.
// We always return 0 rcf score if there is no error.
if (response.getAnomalyScore() <= 0 && response.getError() == null) {
return;
}
IntervalTimeConfiguration windowDelay = (IntervalTimeConfiguration) ((AnomalyDetectorJob) jobParameter).getWindowDelay();
Instant dataStartTime = detectionStartTime.minus(windowDelay.getInterval(), windowDelay.getUnit());
Instant dataEndTime = executionStartTime.minus(windowDelay.getInterval(), windowDelay.getUnit());
Expand Down
Loading