Issue with join & multiple alerts #752

phemmer · 2016-07-25T20:58:18Z

Creating from a discussion on the mailing list: https://groups.google.com/forum/#!topic/influxdb/0LdRf5QIm_I

I'm trying to add the ability to put hosts into a "maintenance mode" which will prevent kapacitor from sending out alerts. In order to do this, I've created a "maintlock" measurement, which tracks a counter on the host. When the "count" field in this measurement is 0, kapacitor is free to send an alert.
I have this mostly working, except only when a tick script stream has a single alert() method. As soon as it has a second alert() method, the "count" field appears to get zeroed out.
For example, this is my kapacitor tick script:

var maintlock = stream|from().measurement('maintlock').groupBy('host')
var data = stream
    |from()
        .measurement('disk').groupBy('host','path')
    |join(maintlock)
        .as('disk','maintlock')
        .on('host')
        .fill('null')
        .tolerance(24h)
    |log()
    |where(lambda: "maintlock.count" == 0)
data
    |alert()
        .crit(lambda: "disk.used_percent" >= 90)
data
    |alert()
        .warn(lambda: "disk.used_percent" >= 80 AND "disk.used_percent" < 90)

This is a few lines of the output from the log() method:

Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/var/lib/influxdb, [host path] map[host:fll2aixd01stg path:/var/lib/influxdb] map[disk.inodes_total:13418496 disk.free:13152272384 disk.used:577781760 disk.used_percent:4.208153543607759 maintlock.count:0 disk.inodes_free:13418409 disk.inodes_used:87 disk.total:13730054144] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/var/lib/influxdb, [host path] map[host:fll2aixd01stg path:/var/lib/influxdb] map[disk.free:13152272384 disk.inodes_free:13418409 disk.inodes_total:13418496 disk.total:13730054144 maintlock.count:1 disk.used_percent:4.208153543607759 disk.inodes_used:87 disk.used:577781760] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.inodes_total:743386 disk.inodes_used:2 disk.total:10485760 disk.used:9437184 disk.used_percent:90 maintlock.count:0 disk.free:1048576 disk.inodes_free:743384] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.inodes_used:2 disk.total:10485760 disk.inodes_free:743384 disk.used:9437184 maintlock.count:1 disk.inodes_total:743386 disk.used_percent:90 disk.free:1048576] 2016-07-26 00:00:00 +0000 UTC}

Notice how each entry is logged twice. One of the times maintlock.count:1, and the other time maintlock.count:0. If I remove one of the alert() methods, it behaves fine, properly tracking the value of maintlock.count. It's only when I add the second alert() that the maintlock.count:0 starts showing up.

The above is experienced with kapacitor 1.0beta2. After upgrading to 1.0beta3 the log() output started showing 4 lines per data point:

Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.inodes_used:2 disk.total:10485760 disk.inodes_total:743386 disk.used:9437184 disk.inodes_free:743384 disk.free:1048576 disk.used_percent:90 maintlock.count:0] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[maintlock.count:1 disk.inodes_total:743386 disk.inodes_used:2 disk.total:10485760 disk.used:9437184 disk.used_percent:90 disk.free:1048576 disk.inodes_free:743384] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.used_percent:90 disk.free:1048576 disk.total:10485760 disk.used:9437184 maintlock.count:0 disk.inodes_total:743386 disk.inodes_free:743384 disk.inodes_used:2] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.used_percent:90 disk.free:1048576 disk.inodes_free:743384 disk.inodes_used:2 disk.total:10485760 disk.inodes_total:743386 disk.used:9437184 maintlock.count:1] 2016-07-26 00:00:00 +0000 UTC}

The text was updated successfully, but these errors were encountered:

phemmer · 2016-07-26T05:14:08Z

Another issue that may be related is that fill('null') and fill(0) don't appear to work.
For example:

var foo = stream|from().measurement('foo').groupBy('host')
stream
  |from()
    .measurement('disk').groupBy('host','path')
  |join(foo)
    .as('disk','foo')
    .on('host')
    .fill('null')
    .tolerance(15s)
  |log()

The log() generates no output until data is seen in the foo measurement with the same host field as the data from disk, and within the 15s tolerance. But according to the docs, fill('null') and fill(0) are supposed to fill in data in this case.

nathanielc · 2016-07-26T16:22:58Z

@phemmer Yes, I can confirm that using fill + on for a join does not work well. Unfortunately the solution is rather involved. The main difficulty is knowing what fields to fill on which points.

For now I have written a test case that defines the expected behavior for the fill + on combination.

I am still looking into the original issue about duplicate points.

phemmer · 2016-07-26T17:44:35Z

I think I understand what you mean by the "knowing what fields to fill".
I'm still new to kapacitor, and don't have a solid grasp on everything yet, but shouldn't fill('null') not need to know the fields (essentially act as if the join() weren't present at all for the data point), and then you could do something like .fill('null') | default().field('foo.bar',123)?

nathanielc · 2016-07-26T19:26:29Z

@phemmer That would make sense yes. But currently it doesn't work that way. Currently the default node only sets nodes that do not exist at all, indpendent of their value. But having the default node set default on null fields makes sense as null fields have no real use.

I'll create a separate issue to track that change.

phemmer · 2016-07-27T18:11:51Z

Assuming #755 is the issue you created to track the change, this issue isn't quite what I meant.
.fill('null')|default() waits for a data point to come in before it will start doing anything. From your description, it uses this data point so it knows what fields to fill in. I was instead looking for something that would cause the .join() not to block at all, and just immediately output the data point with only the fields from the disk measurement.

For example, if there is no data point from the foo measurement, the above script would behave as if it were the following:

stream
  |from()
    .measurement('disk').groupBy('host','path')
  |log()

...but with the fields being prefixed with disk. for consistency.

avp24 · 2019-09-16T13:03:15Z

Hi @phemmer @nathanielc
I am observing the same issue of 4 duplicate lines for same datapoint.

Below are my references for tickscript and alert log file generated. The log file generated based on alert is shown below. For better debugging I have just applied Bold Characters and add new line.

I expect to get only one line in log file based on the alert but I get 4 lines instead.

Kapacitor version:
Kapacitor OSS 1.5.2 (git: HEAD 3086452)
Influx Version:
InfluxDB shell version: 1.7.7

Tickscript:
var critLowerRange = 53
var critUpperRange = 57
var msgUpperRange = 'ON'
var msgLowerRange = 'OFF'

stream
// Select just the cpu measurement from our example database.
|from()
.measurement('sensor_data')
.where(lambda: ((hour("time")*100+minute("time")) >= 600) AND (hour("time")*100+minute("time") <= 1200))
|alert()
.crit(lambda: "sensorData" < critLowerRange)
.message(msgLowerRange)
.details('')
// Whenever we get an alert write it to a file.
.log('/tmp/sensor_data_alerts.log')

|alert()
    .crit(lambda: "sensorData" > critUpperRange)
    .message(msgUpperRange)
    .details('')
    // Whenever we get an alert write it to a file.
    .log('/tmp/sensor_data_alerts.log')

Log File:

{"id":"sensor_data:nil","message":"OFF","details":"","time":"2019-09-16T11:38:55.709039Z","duration":12606938663000,"level":"CRITICAL","data":{"series":[{"name":"sensor_data","tags":{"apartmentId":"6","deviceType":"Temperature","floorId":"2","senesorGroupId":"100","smuMacId":"a2:b3:33:45"},"columns":["time","battery","rssi","sensorData","sensorMacid"],"values":[["2019-09-16T11:38:55.709039Z",72,-47,52.9,"22:45:45:11"]]}]},"previousLevel":"CRITICAL","recoverable":true}

nathanielc mentioned this issue Jul 26, 2016

Fix fill for join on and batches #756

Merged

3 tasks

nathanielc closed this as completed in #756 Jul 27, 2016

phemmer mentioned this issue Jul 28, 2016

should be able to pause/mute alerts with criteria #722

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with join & multiple alerts #752

Issue with join & multiple alerts #752

phemmer commented Jul 25, 2016

phemmer commented Jul 26, 2016

nathanielc commented Jul 26, 2016

phemmer commented Jul 26, 2016

nathanielc commented Jul 26, 2016

phemmer commented Jul 27, 2016

avp24 commented Sep 16, 2019

Issue with join & multiple alerts #752

Issue with join & multiple alerts #752

Comments

phemmer commented Jul 25, 2016

phemmer commented Jul 26, 2016

nathanielc commented Jul 26, 2016

phemmer commented Jul 26, 2016

nathanielc commented Jul 26, 2016

phemmer commented Jul 27, 2016

avp24 commented Sep 16, 2019