Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with join & multiple alerts #752

Closed
phemmer opened this issue Jul 25, 2016 · 6 comments · Fixed by #756
Closed

Issue with join & multiple alerts #752

phemmer opened this issue Jul 25, 2016 · 6 comments · Fixed by #756

Comments

@phemmer
Copy link

phemmer commented Jul 25, 2016

Creating from a discussion on the mailing list: https://groups.google.com/forum/#!topic/influxdb/0LdRf5QIm_I

I'm trying to add the ability to put hosts into a "maintenance mode" which will prevent kapacitor from sending out alerts. In order to do this, I've created a "maintlock" measurement, which tracks a counter on the host. When the "count" field in this measurement is 0, kapacitor is free to send an alert.
I have this mostly working, except only when a tick script stream has a single alert() method. As soon as it has a second alert() method, the "count" field appears to get zeroed out.
For example, this is my kapacitor tick script:

var maintlock = stream|from().measurement('maintlock').groupBy('host')
var data = stream
    |from()
        .measurement('disk').groupBy('host','path')
    |join(maintlock)
        .as('disk','maintlock')
        .on('host')
        .fill('null')
        .tolerance(24h)
    |log()
    |where(lambda: "maintlock.count" == 0)
data
    |alert()
        .crit(lambda: "disk.used_percent" >= 90)
data
    |alert()
        .warn(lambda: "disk.used_percent" >= 80 AND "disk.used_percent" < 90)

This is a few lines of the output from the log() method:

Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/var/lib/influxdb, [host path] map[host:fll2aixd01stg path:/var/lib/influxdb] map[disk.inodes_total:13418496 disk.free:13152272384 disk.used:577781760 disk.used_percent:4.208153543607759 maintlock.count:0 disk.inodes_free:13418409 disk.inodes_used:87 disk.total:13730054144] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/var/lib/influxdb, [host path] map[host:fll2aixd01stg path:/var/lib/influxdb] map[disk.free:13152272384 disk.inodes_free:13418409 disk.inodes_total:13418496 disk.total:13730054144 maintlock.count:1 disk.used_percent:4.208153543607759 disk.inodes_used:87 disk.used:577781760] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.inodes_total:743386 disk.inodes_used:2 disk.total:10485760 disk.used:9437184 disk.used_percent:90 maintlock.count:0 disk.free:1048576 disk.inodes_free:743384] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 12:39:44 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 12:39:44 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.inodes_used:2 disk.total:10485760 disk.inodes_free:743384 disk.used:9437184 maintlock.count:1 disk.inodes_total:743386 disk.used_percent:90 disk.free:1048576] 2016-07-26 00:00:00 +0000 UTC}

Notice how each entry is logged twice. One of the times maintlock.count:1, and the other time maintlock.count:0. If I remove one of the alert() methods, it behaves fine, properly tracking the value of maintlock.count. It's only when I add the second alert() that the maintlock.count:0 starts showing up.

The above is experienced with kapacitor 1.0beta2. After upgrading to 1.0beta3 the log() output started showing 4 lines per data point:

Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.inodes_used:2 disk.total:10485760 disk.inodes_total:743386 disk.used:9437184 disk.inodes_free:743384 disk.free:1048576 disk.used_percent:90 maintlock.count:0] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[maintlock.count:1 disk.inodes_total:743386 disk.inodes_used:2 disk.total:10485760 disk.used:9437184 disk.used_percent:90 disk.free:1048576 disk.inodes_free:743384] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.used_percent:90 disk.free:1048576 disk.total:10485760 disk.used:9437184 maintlock.count:0 disk.inodes_total:743386 disk.inodes_free:743384 disk.inodes_used:2] 2016-07-26 00:00:00 +0000 UTC}
Jul 25 13:03:04 fll2aixd01stg kapacitord[6421]: [disk_free:log5] 2016/07/25 13:03:04 I!  {disk   host=fll2aixd01stg,path=/tmp/mnt, [host path] map[host:fll2aixd01stg path:/tmp/mnt] map[disk.used_percent:90 disk.free:1048576 disk.inodes_free:743384 disk.inodes_used:2 disk.total:10485760 disk.inodes_total:743386 disk.used:9437184 maintlock.count:1] 2016-07-26 00:00:00 +0000 UTC}
@phemmer
Copy link
Author

phemmer commented Jul 26, 2016

Another issue that may be related is that fill('null') and fill(0) don't appear to work.
For example:

var foo = stream|from().measurement('foo').groupBy('host')
stream
  |from()
    .measurement('disk').groupBy('host','path')
  |join(foo)
    .as('disk','foo')
    .on('host')
    .fill('null')
    .tolerance(15s)
  |log()

The log() generates no output until data is seen in the foo measurement with the same host field as the data from disk, and within the 15s tolerance. But according to the docs, fill('null') and fill(0) are supposed to fill in data in this case.

@nathanielc
Copy link
Contributor

@phemmer Yes, I can confirm that using fill + on for a join does not work well. Unfortunately the solution is rather involved. The main difficulty is knowing what fields to fill on which points.

For now I have written a test case that defines the expected behavior for the fill + on combination.

I am still looking into the original issue about duplicate points.

@phemmer
Copy link
Author

phemmer commented Jul 26, 2016

I think I understand what you mean by the "knowing what fields to fill".
I'm still new to kapacitor, and don't have a solid grasp on everything yet, but shouldn't fill('null') not need to know the fields (essentially act as if the join() weren't present at all for the data point), and then you could do something like .fill('null') | default().field('foo.bar',123)?

@nathanielc
Copy link
Contributor

@phemmer That would make sense yes. But currently it doesn't work that way. Currently the default node only sets nodes that do not exist at all, indpendent of their value. But having the default node set default on null fields makes sense as null fields have no real use.

I'll create a separate issue to track that change.

@phemmer
Copy link
Author

phemmer commented Jul 27, 2016

Assuming #755 is the issue you created to track the change, this issue isn't quite what I meant.
.fill('null')|default() waits for a data point to come in before it will start doing anything. From your description, it uses this data point so it knows what fields to fill in. I was instead looking for something that would cause the .join() not to block at all, and just immediately output the data point with only the fields from the disk measurement.

For example, if there is no data point from the foo measurement, the above script would behave as if it were the following:

stream
  |from()
    .measurement('disk').groupBy('host','path')
  |log()

...but with the fields being prefixed with disk. for consistency.

@avp24
Copy link

avp24 commented Sep 16, 2019

Hi @phemmer @nathanielc
I am observing the same issue of 4 duplicate lines for same datapoint.

Below are my references for tickscript and alert log file generated. The log file generated based on alert is shown below. For better debugging I have just applied Bold Characters and add new line.

I expect to get only one line in log file based on the alert but I get 4 lines instead.

Kapacitor version:
Kapacitor OSS 1.5.2 (git: HEAD 3086452)
Influx Version:
InfluxDB shell version: 1.7.7

Tickscript:
var critLowerRange = 53
var critUpperRange = 57
var msgUpperRange = 'ON'
var msgLowerRange = 'OFF'

stream
// Select just the cpu measurement from our example database.
|from()
.measurement('sensor_data')
.where(lambda: ((hour("time")*100+minute("time")) >= 600) AND (hour("time")*100+minute("time") <= 1200))
|alert()
.crit(lambda: "sensorData" < critLowerRange)
.message(msgLowerRange)
.details('')
// Whenever we get an alert write it to a file.
.log('/tmp/sensor_data_alerts.log')

|alert()
    .crit(lambda: "sensorData" > critUpperRange)
    .message(msgUpperRange)
    .details('')
    // Whenever we get an alert write it to a file.
    .log('/tmp/sensor_data_alerts.log')

Log File:

{"id":"sensor_data:nil","message":"OFF","details":"","time":"2019-09-16T11:38:55.709039Z","duration":12606938663000,"level":"CRITICAL","data":{"series":[{"name":"sensor_data","tags":{"apartmentId":"6","deviceType":"Temperature","floorId":"2","senesorGroupId":"100","smuMacId":"a2:b3:33:45"},"columns":["time","battery","rssi","sensorData","sensorMacid"],"values":[["2019-09-16T11:38:55.709039Z",72,-47,52.9,"22:45:45:11"]]}]},"previousLevel":"CRITICAL","recoverable":true}

{"id":"sensor_data:nil","message":"OFF","details":"","time":"2019-09-16T11:38:55.709039Z","duration":12606938663000,"level":"CRITICAL","data":{"series":[{"name":"sensor_data","tags":{"apartmentId":"6","deviceType":"Temperature","floorId":"2","senesorGroupId":"100","smuMacId":"a2:b3:33:45"},"columns":["time","battery","rssi","sensorData","sensorMacid"],"values":[["2019-09-16T11:38:55.709039Z",72,-47,52.9,"22:45:45:11"]]}]},"previousLevel":"CRITICAL","recoverable":true}

{"id":"sensor_data:nil","message":"OFF","details":"","time":"2019-09-16T11:38:55.709039Z","duration":12606938663000,"level":"CRITICAL","data":{"series":[{"name":"sensor_data","tags":{"apartmentId":"6","deviceType":"Temperature","floorId":"2","senesorGroupId":"100","smuMacId":"a2:b3:33:45"},"columns":["time","battery","rssi","sensorData","sensorMacid"],"values":[["2019-09-16T11:38:55.709039Z",72,-47,52.9,"22:45:45:11"]]}]},"previousLevel":"CRITICAL","recoverable":true}

{"id":"sensor_data:nil","message":"OFF","details":"","time":"2019-09-16T11:38:55.709039Z","duration":12606938663000,"level":"CRITICAL","data":{"series":[{"name":"sensor_data","tags":{"apartmentId":"6","deviceType":"Temperature","floorId":"2","senesorGroupId":"100","smuMacId":"a2:b3:33:45"},"columns":["time","battery","rssi","sensorData","sensorMacid"],"values":[["2019-09-16T11:38:55.709039Z",72,-47,52.9,"22:45:45:11"]]}]},"previousLevel":"CRITICAL","recoverable":true}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants