-
Notifications
You must be signed in to change notification settings - Fork 294
Snapd does not appropriately handle a plugin hanging on {collect, process, publish} #1142
Comments
This is the intention of the deadline setting for a task, so that a task (or jobs within the task) can be killed or not run if the task run time exceeds the configured deadline. Testing with The interval in a task is just how frequent a user wants the task to run, but this does not equate to how quickly a plugin or plugins could collect, process, or publish the metrics listed in the task manifest. |
I hadn't dug into it enough to try that case. The behavior that prompted this was a plugin that hangs indefinitely was "freezing" the workflow in that 1.) the task was never being failed, 2.) No other runs of this task were happening. The end result was task list showing some number of hits, 0 misses, 0 failures. I'm not sure what the value is of the deadline if it works how your saying. My expectation was that a deadline was meant to stop runaway workflows, but if snap explicitly waits for it to return before enforcing the deadline then any runaway step in the workflow pauses that task indefinitely. My expectation would be that deadline would be enforced aggressively, as in this is the absolute maximum time a step in the workflow has to execute, and if it doesn't finish in that time we stop it and mark it as a failure.
I was just suggesting if you had to pick a number to timeout a step in the workflow, some multiple of the interval seems like a good starting place. I also wasn't aware of some of the details you added above re: deadline. |
I agree, deadline should be honored throughout the workflow and not just during the transitions during the workflow from collector to processor or publisher and processor to publisher. |
Adds handling of the timeout to the native go client which enforces that each request for {collect,process,publish} finishes in this amount of time. Ups the default client timeout to 10 seconds.
Adds handling of the timeout to the native go client which enforces that each request for {collect, process, publish} finishes in this amount of time. Ups the default client timeout to 10 seconds.
If a plugin hangs on any step in the workflow, snapd does not appear to be stopping the plugin/failing the task appropriately. To test you can add this line to the CollectMetrics() function inside the mock2 collector:
Then load the mock plugins (from snap/build/bin run:
./snapd -t 0 -l 1 -a ../plugin
) and load theexample/tasks/mock-file.json
task file. The task will hang after 1 "hit". Snapd does not appear to be failing the task correctly.My expectation would be that a plugin hanging would result in a failed run after some specified amount of time (I think you could base it off the interval for a reasonable automatic guess).
The text was updated successfully, but these errors were encountered: