-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock after a leader election. #11436
Comments
Hi @etherandrius, Thanks for reporting this. It's hard to say what happened without stack traces. If it happens again could you send a SIGUSR2 to the active node and find the stack traces in the logs? |
I will. As a follow up to this could we add functionality to send SIGUSR2 via |
There's no reason to assume |
Closing the issue due to staleness. Closing stale issues helps us keep the issue count down and the project healthy. Keeping the issue count under a manageable number helps us provide faster responses and better engagement with the community. If you feel that the issue is still relevant, or if it is wrongly closed, please leave a comment and we'd be happy to reopen it. |
@vishalnayak I'd like to state that we've had a direct replica of this error. |
To resolve we had to manually remove the instance with the increase of goroutines which then resolved the deadlock. |
@s3than we're definitely interested in hearing more. I suggest you open a new bug and provide us with whatever details you have. |
Describe the bug
post-election task failed on a newly elected leader. This resulted in a cluster wide outage, which did not self resolve. Faulty leader had to be terminated manually.
To Reproduce
I was not able to reproduce the issue
Expected behavior
Either post-election task self recovered or leadership to be taken over by another vault instance.
Environment:
Vault server configuration file(s):
Additional context
Logs from vault leader
Go routines were steadily increasing, until termination of the instance

I was not able to run
vault debug
received errorError during validation: unable to connect to server: context deadline exceeded
I was not able to curl
/v1/sys/pprof/goroutine
the connection hung for 20+min before I canceled it.The issue seems similar to #11276 and #10456. However, we are running 1.6.3 and the bug was supposed to be fixed in 1.6.1.
So far we've only observed this once
The text was updated successfully, but these errors were encountered: