-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault agent DoS's the vault server if template command fails #12566
Comments
Thanks for submitting the issue, @nvx ! Just wanted to swing by and drop an update that I've been able to reproduce this issue, while looking at a separate one, and can confirm the retries in a tight loop is indeed the case when some command from the template file fails. |
@nvx What happens if you also set |
Looking at the configuration, this option was already set. |
Hi there! I believe this will be fixed by #16970 - if you have caching in the Vault Agent config, currently, Agent will ignore the retry configuration and perform immediate infinite retries in a loop with no back-off. As a workaround, an empty The fix should be coming in 1.12. |
Hi there! I'm going to close this issue as I just merged #16970 which should fix this. This should release in 1.12. Thanks for the bug report! |
It isn't fixed. If there's an error, it spins like crazy. 2024-02-06T13:31:15.365Z [DEBUG] agent: (runner) checking template ea719e25596e1dd1659265c8ec95f87b Then at 2024-02-06T13:31:15.366Z , .367Z, .368Z, etc...each millisecond. crazy. |
Hey there @celesteking ! I apologize if we conflated two issues. The PR linked above definitely fixed an issue related to retries with the cache, but it's possible that we mixed up some issues that seemed related. To make sure I understand the issue properly, the issue here is that when the Agent template server crashes, the server restarts and immediately retries, and since it crashed as opposed to errored, it doesn't know to retry. Does that sound right? The other issues linked to this one were related to it erroring (but the server not crashing), and it looks like that got confused. There's kind of a sub issue here, in that the template server probably shouldn't crash in some of these cases. Do you have a configuration you can share that reproduces the issue, or any more details you can share? |
Pretty easy to reproduce with config |
Thank you! I appreciate the reproduction too. This is definitely a bug and you're right that it wasn't addressed by the previous PR. I'll raise a ticket in our internal bug tracker for this and try and get someone's eyes on it soon. There are retry backoff timers for failures, but not for anything that crashes the template server entirely. I can't promise anything with regards to timelines but I'm eager to get a fix in soon. I'll link the GitHub issue so you'll know when we're done. |
That ain't a problem, no rush. I'm still learning about this thing. |
In either case, I appreciate you letting us know this is still a problem. Thank you! |
Describe the bug
Running a vault agent with a command that returns a non-zero error code results in the template being restarted immediately with no back-off or retry delay. This results in many requests to the Vault server in quick succession which can cause a Vault outage due to resource exhaustion.
To Reproduce
Expected behavior
Other areas of the Vault Agent such as the auto auth implement a back-off mechanism on failure to avoid excessive resource consumption (either locally or from the Vault server). A similar back-off when restarting the template would resolve this issue.
Environment:
Vault v1.8.2 (aca76f6)
Additional context
#9200 appears to have implemented the auto-restarting behaviour. This might be a location where such a back-off could be added.
#9059 may or may not be related to this, I'm not sure enough of consul-templates internals to know if setting that retry would fix this issue or not.
Setting this config option causes Vault Agent to quit on error rather than retry which mitigates this issue, but then relies on something else to restart the Vault Agent (hopefully with an appropriate back-off time):
Vault Agent Logs
The text was updated successfully, but these errors were encountered: