Change Alert Levels to be independent instead of subsets. #298

yosiat · 2016-03-09T17:55:46Z

Hi,

I have the next tick script:

stream
    .from().measurement('window_data')
    .alert()
      .id('WindowAlert')
      .info(lambda: "value" < 10)
      .warn(lambda: "value" > 10)
      .crit(lambda: "value" > 200)
      .log('/Volumes/Data/influx/alerts.log')

I am injecting data using this script:

require 'influxdb'

database  = 'kapacitor_example'
name      = 'window_data'
precision = 's'
retention = 'default'

influxb = InfluxDB::Client.new database, host: "localhost", port: 9092


time = Time.now.to_i
value = 0

points = []
10.times do
  value = value + 5
  time = time + 10

  data = {
    series: name,
    values: { value: value },
    timestamp: time
  }

  puts value
  points << data
end

puts "Inserting #{points.length} points"
influxb.write_points(points, precision, retention)

And I am getting two alerts triggered OK and INFO, and I expected WARN to jump as well.
I debugged alert source code and found some issues with code (and I can submit pull request):

The levels are examined from OK to Critical, which in my opinion is wrong, it should be from Critical to OK - For example: If we inject the value 250 only warning (>10) will jump and not critical (>200)
I saw that once some levels is passed this is level isn't returned, for example we have value of 250 and critical passed it we won't return Critical we will return Info (because this the last level - see my previous point)

I modified the code to be like this:

func (a *AlertNode) determineLevel(now time.Time, fields models.Fields, tags map[string]string) (level AlertLevel) {
    for i := len(a.levels) - 1; i >= 0; i-- {
        l := i
        se := a.levels[i]
        if se == nil {
            continue
        }

        if pass, err := EvalPredicate(se, now, fields, tags); pass {
            return AlertLevel(l)
        } else if err != nil {
            a.logger.Println("E! error evaluating expression:", err)
            return
        }
    }


    return OKAlert
}

@nathanielc What do you think? do you want PR?

The text was updated successfully, but these errors were encountered:

nathanielc · 2016-03-09T18:06:39Z

@yosiat
You have a value that exists on a number line and you have two break points that define three regions of the number line. So to define the three regions you can use:

stream
    .from().measurement('window_data')
    .alert()
      .id('WindowAlert')
      .info(lambda: TRUE)
      .warn(lambda: "value" > 10)
      .crit(lambda: "value" > 200)
      .log('/Volumes/Data/influx/alerts.log')

And the highest matching level will be chosen. The advantage of assuming that each level is a subset of the previous level is that you can break out early and not evaluate every condition.

From the docs It is assumed that each successive level filters a subset of the previous level. As a result, the filter will only be applied if a data point passed the previous level.

While it is possible for the problem to be computed in the other direction since expressions can be stateful the current behavior made the most since as each higher level will only be evaluated on the data set the passed the previous level.

yosiat · 2016-03-09T19:22:51Z

@nathanielc
Ok, I understand but Kapacitor checks the alerts as a subset for each other (Info > Warn > Crit)
But if we are using separate conditions for each level than the only alert that triggered is info.

Let's take an example: we have stream of "number_of_exceptions" which increments when the application server has an alert while processing, and we want to check that the exceptions isn't above some threshold.

stream
    .from().measurement('number_of_exceptions')
    .alert()
      .id('NumberOfExceptions')
      .info(lambda: "value" < 1)
      .warn(lambda: "value" == 1)
      .crit(lambda: "value" > 1)
      .log('/Volumes/Data/influx/alerts.log')

For this case we will have info, in that case my alert levels aren't successive and I don't know how to create alert for this.

And of course there is the opposite case - I want to check if I have some data and I want to warn if it's 1 and send critical value is less than 1 ( => no data)

yosiat · 2016-03-11T10:06:58Z

@nathanielc I finally solved this by separating the alert to multiple alerts like this:

var data = stream
    .from().measurement('number_of_exceptions')

data.alert().id('NumberOfExceptions').info(lambda: "value" < 1).log('/Volumes/Data/influx/alerts.log')
data.alert().id('NumberOfExceptions').warn(lambda: "value" == 1).log('/Volumes/Data/influx/alerts.log')
data.alert().id('NumberOfExceptions').crit(lambda: "value" > 1).log('/Volumes/Data/influx/alerts.log')

What do you think?

nathanielc · 2016-03-11T15:56:54Z

Wouldn't this work too?

stream
    .from().measurement('number_of_exceptions')
    .alert()
      .id('NumberOfExceptions')
      .info(lambda: TRUE)
      .warn(lambda: "value" >= 1)
      .crit(lambda: "value" > 1)
      .log('/Volumes/Data/influx/alerts.log')

yosiat · 2016-03-11T16:13:50Z

@nathanielc This would work, but my tick script are auto generated (converter from legacy system), so this is the best solution I have.

nathanielc · 2016-03-11T16:44:49Z

Ok, I am thinking we might change this to be a mix of our implementations. Since it can be confusing that lower levels need to be a super set of higher levels what if we evaluate every level every time? That way its simple and you always know that each point will be evaluated for all levels. It will be slower but we are taking a max of four levels and simplicity is probably more important than a slight optimization. Thoughts?

yosiat · 2016-03-11T16:48:44Z

@nathanielc I agree, I think we need first measure this - to know how much does it hearts on performance and then decide upon this, but I prefer simplicity over performance.

nathanielc · 2016-06-02T20:33:17Z

For now we are going to leave the behavior as is, since there hasn't been a strong push to change it. We can revisit later if need be.

nathanielc · 2016-07-20T22:42:27Z

OK, I keep running into issues where the subset logic is annoying and requires that you work around it. I am going to reopen this issue.

yosiat changed the title ~~Alert is not triggered~~ Wrong level of alert is determined Mar 9, 2016

nathanielc modified the milestones: v0.14, v0.13 Apr 5, 2016

nathanielc changed the title ~~Wrong level of alert is determined~~ Change Alert Levels to be independent instead of subsets. Apr 5, 2016

yosiat mentioned this issue Apr 29, 2016

Compiled stateful expression #491

Merged

nathanielc modified the milestones: v0.14, v0.13 May 5, 2016

nathanielc closed this as completed Jun 2, 2016

nathanielc reopened this Jul 20, 2016

nathanielc mentioned this issue Jul 20, 2016

Change alert levels to be independent #732

Merged

3 tasks

nathanielc closed this as completed in #732 Jul 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Alert Levels to be independent instead of subsets. #298

Change Alert Levels to be independent instead of subsets. #298

yosiat commented Mar 9, 2016

nathanielc commented Mar 9, 2016

yosiat commented Mar 9, 2016

yosiat commented Mar 11, 2016

nathanielc commented Mar 11, 2016

yosiat commented Mar 11, 2016

nathanielc commented Mar 11, 2016

yosiat commented Mar 11, 2016

nathanielc commented Jun 2, 2016

nathanielc commented Jul 20, 2016

Change Alert Levels to be independent instead of subsets. #298

Change Alert Levels to be independent instead of subsets. #298

Comments

yosiat commented Mar 9, 2016

nathanielc commented Mar 9, 2016

yosiat commented Mar 9, 2016

yosiat commented Mar 11, 2016

nathanielc commented Mar 11, 2016

yosiat commented Mar 11, 2016

nathanielc commented Mar 11, 2016

yosiat commented Mar 11, 2016

nathanielc commented Jun 2, 2016

nathanielc commented Jul 20, 2016