-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul panics when committing changes to the state store #2724
Comments
Hi @cityofships thanks for the report - we will figure this out (it was new code for 0.7.3). /cc @dadgar |
We saw a similar looking panic on 0.7.3 using the consul_0.7.3_linux_amd64 binary https://gist.github.com/mpuncel/854425910b66cf697dfb7797b586f969 |
Thanks @mpuncel that helps show that this probably isn't related to |
@cityofships and @mpuncel are either of you able to reproduce this pretty readily? I've got some tests going now to try to trigger it but haven't had any luck. |
yes, I'm willing to re-test |
@cityofships thanks. Using your steps above (which I super appreciate) I was able to reproduce this and I've got a fix in the works. Should have something for you to try soon. |
This fixes #2724 by properly tracking leaf updates during very large delete transactions.
@cityofships the fix has been merged to master - if you have a chance to fuzz this again please let us know how it goes. Thank you! |
This fixes #2724 by properly tracking leaf updates during very large delete transactions.
Are there any hints on how to recover an existing cluster from this once this happens? Is there a way to purge the queue of updates? |
@jtchoi the cleanest recovery would be to shut down all the servers and upgrade the Consul binary of 0.7.5, which will be able to apply the update without a crash. You could also shut down the servers and remove the |
@slackpad Thanks! We were considering both approaches and will try the former |
consul version
Client:
v0.7.4
Server:
v0.7.4
Operating system and Environment details
consul in docker run in HA mode on 3 nodes with RHEL 7.3 and Docker 1.13.0
Kernel Version: 3.10.0-514.6.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.5 GiB
Storage Driver: devicemapper
consul config:
Description of the Issue (and unexpected/desired result)
Consul panics during "stress test" - not being able to process all KV delete requests.
Reproduction steps
node1:
node2:
node3:
Log Fragments
https://gist.github.com/cityofships/9ffbaf9badac8b0198f352ce76cc7239
The text was updated successfully, but these errors were encountered: