-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make creating an alert based on event count easy #137
Comments
For option 2 the behavior will be to use the data flow to understand time but on a realtime interval emit the window. If data arrives too late it will need to be dropped since time has already passed but this will be measurable and we can report the number of dropped points. There might be some challenges with using realtime and replays but we should be able to work something out. |
Another option could be to expose the internal throughput numbers directly to the running task. Each node could have a stats() method that emits the internal stats of the node on a specified realtime interval. For example:
This allows for more interesting applications since it allows you to essentially meta program your tasks. |
@pauldix Thoughts on some of the ideas for making a dead-man's switch easy to build? |
I personally like the Would |
I also like the |
Yes, but in a different way that makes it harder to shoot yourself in the foot (i.e no data points need be dropped). For example given that you are sending a data point to Kapacitor once per second here are the two options: Reatime.
With the realtime approach this alert is prone to many false positives if the data arrives slightly ahead or behind. If a data point arrives too late it needs to be dropped since time has moved on without it. Here is the stats approach:
The stats alert in this case can still create lots of false positive is the data arrives too early or too late, but since this is now a separate data stream from the normal data, no data points need be dropped. This means its possible for the times of the two different streams be different but since you are using them separately this shouldn't be a problem. Now, if you try to join some of the stats data with data from the 'real' stream you could cause issues, but that is a different problem around timeout writes to streams. Obviously in practice you would compare against a threshold not an exact value i.e.:
|
+1 to the stats approach. Although it feels super verbose to define something as simple as a dead man's switch. |
If all you want is a dead man's switch and don't care about the data itself it looks like this:
We could add special method for just the dead man's switch use case?
Then it could leverage the global alert config for how to handle to alert. This can always be added later if we think it adds enough value. For now I think I'll move forward with the |
Currently creating an alert on event count seems like a simple task, for example:
The above script will not work. The reason is that once the event stream stops so does time and therefore the rest of the pipeline is not executed. As a result the alert logic is never evaluated and an alert is never triggered.
Possible solutions:
Use a separate task and the internal Kapacitor stats about throughput to achieve this.
For example:
This will work but has the requirement that you use two tasks. One to receive the data and the other to check the throughput.
Add the ability to use the real clock time on a window node so that when real time passes it will emit a batch even if its empty. i.e.
Simple and optional. Using the realtime clock allows for races to occur when processing the data and so will generally not be recommended but for the specific use case of counting it could work. To be clear the races are not golang data races but rather races in how Kapacitor processes the data, i.e the datapoint arrives too late and so Kapacitor does not process the datapoint but has to drop it since the window was already emitted.
The text was updated successfully, but these errors were encountered: