-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(daemon): turn off net/http panic recovery #350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall this is an improvement, as we will now see panics as they happen, but it may lead to some unexpected panics in production uses of pebble until we identify the cause of the deadlock that spawned this line of inquiry.
@hpidcock The thing is, those "unexpected panics in production uses" have a decent chance of deadlocking the entire Pebble process (if a lock is held with |
Given we now know panic-recovery isn't the cause of the issue at hand, I'm going to come back to this after reproducing and fixing #314, but I'll leave this PR open -- I think it's probably a good idea in any case. |
For reference, Gustavo was positive about this in a spec review meeting a couple of months ago (or more accurately, didn't love the default net/http behaviour and agreed it would be good to turn it off). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like panic'ing in production, but I think if we don't, we are going to miss these fatal bugs.
By default, Go's
net/http
recovers panics in theServeHTTP
goroutine and keeps serving. However, if a lock is held when the panic happens, this will mean a complete deadlock and Pebble will go unresponsive. As far as I know, this hasn't happened yet, but we use snippets withoutdefer
like this fairly frequently if it's just a tiny amount of code and defer is awkward:If
st.Foo()
or whatever's between the lock and unlock panics,net/http
will recover the panic but the lock will still be held. And in the case of the state lock that basically means all requests are blocked -- bad!There's no direct way of opting out of this
net/http
behaviour, so you have torecover()
from panics yourself, print the goroutine dump, and exit.I've tested this locally, and it prints the goroutine dump and exits with code 1 on a panic as expected, instead of locking up. It also handles any panic in the handler or middleware, eg in
logit
.