Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(daemon): turn off net/http panic recovery #350

Merged
merged 3 commits into from
Jun 18, 2024

Conversation

benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Feb 5, 2024

By default, Go's net/http recovers panics in the ServeHTTP goroutine and keeps serving. However, if a lock is held when the panic happens, this will mean a complete deadlock and Pebble will go unresponsive. As far as I know, this hasn't happened yet, but we use snippets without defer like this fairly frequently if it's just a tiny amount of code and defer is awkward:

st.Lock()
foo := st.Foo()
st.Unlock()

If st.Foo() or whatever's between the lock and unlock panics, net/http will recover the panic but the lock will still be held. And in the case of the state lock that basically means all requests are blocked -- bad!

There's no direct way of opting out of this net/http behaviour, so you have to recover() from panics yourself, print the goroutine dump, and exit.

I've tested this locally, and it prints the goroutine dump and exits with code 1 on a panic as expected, instead of locking up. It also handles any panic in the handler or middleware, eg in logit.

@benhoyt benhoyt requested a review from hpidcock February 5, 2024 03:58
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall this is an improvement, as we will now see panics as they happen, but it may lead to some unexpected panics in production uses of pebble until we identify the cause of the deadlock that spawned this line of inquiry.

@troyanov troyanov mentioned this pull request Feb 5, 2024
@benhoyt
Copy link
Contributor Author

benhoyt commented Feb 7, 2024

I think overall this is an improvement, as we will now see panics as they happen, but it may lead to some unexpected panics in production uses of pebble until we identify the cause of the deadlock that spawned this line of inquiry.

@hpidcock The thing is, those "unexpected panics in production uses" have a decent chance of deadlocking the entire Pebble process (if a lock is held with defer.Unlock()), which is much worse than exiting (in which case the calling process will likely restart it). In any case, we'll chat further this afternoon about the best way forward.

@benhoyt
Copy link
Contributor Author

benhoyt commented Feb 12, 2024

Given we now know panic-recovery isn't the cause of the issue at hand, I'm going to come back to this after reproducing and fixing #314, but I'll leave this PR open -- I think it's probably a good idea in any case.

@benhoyt benhoyt added the 24.10 label Mar 13, 2024
@benhoyt benhoyt changed the title fix(daemon): turn off net/http behaviour of recovering from panics fix(daemon): turn off net/http panic recovery Jun 14, 2024
@benhoyt
Copy link
Contributor Author

benhoyt commented Jun 14, 2024

For reference, Gustavo was positive about this in a spec review meeting a couple of months ago (or more accurately, didn't love the default net/http behaviour and agreed it would be good to turn it off).

Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like panic'ing in production, but I think if we don't, we are going to miss these fatal bugs.

@benhoyt benhoyt merged commit 25edfff into canonical:master Jun 18, 2024
16 checks passed
@benhoyt benhoyt deleted the panic-handlers branch June 18, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants