-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: restart template runner on retry for unlimited retries #11775
Conversation
I think I'm okay with the idea of having a new key governing the unlimited-retry behaviour. I'm not sure I like where it is or the fact that the default would be limited-retries - though I get that agent has had limited retries since 1.5 and that going back and forth isn't great. Since traditionally Agent was meant to be a long-running process, exiting due to some errors talking to Vault is unusual/surprising behaviour, which is why I'm advocating for making indefinite retries the default. The Proposal 1: add a This new
Proposal 2:
|
The reason that I'm open to moving Thanks for the review and feedback! |
@ncabatoff chatted with Product and it seems that we're okay with releasing this for the next major release. I'm open to the idea of having a new I'm still unsure if we should change the default back to be unlimited though. Thoughts on this @jasonodonnell and @tvoran, especially around the k8s use case? |
Providing a public update on where we stand on the default behavior after a few rounds of internal discussions: We're going with Nick's proposal 1, with some slight modifications. We will be adding a new stanza template_config {
exit_on_retry_failure = <bool>
} As for the omission of It's worth noting that the default value for Our rationale for reverting back to unlimited template retries by default is that there's a valid use case brought to us by our users and customers for Agent to continue running when other subsystems such as caching are defined, regardless of whether the templates rendered successfully or not. We also understand that there are other use cases that want the exact opposite. For instance, in the kubernetes use case having unlimited retries can lead to containers and pods being forever stuck during deployments. However, this time around the unlimited template retries behavior can be turned off since this is not always desirable. Separately, our vault-helm chart will make this behavior configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small suggestions.
Co-authored-by: Jim Kalafut <[email protected]> Co-authored-by: Theron Voran <[email protected]>
Co-authored-by: Brian Kassouf <[email protected]>
Co-authored-by: Tom Proctor <[email protected]>
This PR takes an alternate approach compared to the one from #11711 on re-introducing unlimited template retry behavior that existed on Vault 1.5.x and earlier. A separate flag controls whether agent restarts the templating server on error, which is how it used to work up until 1.5.x.
This approach maintains backward compatibility on how retry works with Agent 1.6.x and onward while optionally allowing users to revert to the old (1.5.x and earlier) behavior if desired.
I've chosen the value to reside atvault.retry.template_unlimited_retries = <bool>
since thetemplate
stanza is an actual CT config and scoped on a per-template basis, but I'm also open to suggestions where else it could go or what else it could be named.Update: See #11775 (comment) for the latest update on the config value and format.
Supersedes #11711
TODO: