Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion for trillium: Stock handler for catching panics and generating server errors #379

Closed
joshtriplett opened this issue Oct 13, 2023 · 7 comments · May be fixed by #380
Closed

Comments

@joshtriplett
Copy link
Collaborator

I'd love to have a stock handler that wraps another handler and catches any panics, turning them into server errors. I'm happy to write and submit one, if this sounds reasonable.

@jbr
Copy link
Contributor

jbr commented Oct 13, 2023

I tidied up an old branch that allows for unwinding, but I'm still unsure if it's a good idea to support this, as it feels like a misuse of panic unwinding. Because the Conn contains the Transport, unwinding a panic into a http error is accomplished with a Drop implementation on trillium::Conn that effectively takes all of the inner state and sends it to a FnOnce that gets attached to the Conn.

If you can find a way to do this without changes to trillium itself, I'd be keen to hear about that, but because the Transport is not necessarily Clone and is owned by the Conn, I think this requires changes to trillium. It looks like it's semver-patch, though

I'll PR this and possibly gate it behind a feature pending further experimentation, as "panic for flow control" seems like a big hack.

@jbr jbr mentioned this issue Oct 13, 2023
@joshtriplett
Copy link
Collaborator Author

joshtriplett commented Oct 14, 2023

@jbr I'm not looking to do "panic for flow control" here; this is to catch unexpected panics and avoid killing the whole server, and finish any tracing spans.

I think you could simplify this, by not preserving any of the state from the existing Conn, and just making a new Conn with no state to handle the internal server error, or even just sending a hardcoded HTTP response over the Transport. And you could require that the transport implements Clone in order to support this, since many transports will. (That's still not trivial, since Conn boxes the transport. :( ) Then clone the transport before running the original Handler, replace the transport in the original Conn with one that flags (via AtomicBool) if any bytes have been written yet, and iff no bytes have been written when the panic occurs, use the cloned Transport to send the internal server error.

@joshtriplett
Copy link
Collaborator Author

That said, perhaps trying to return a response from an unexpected panic is unnecessary, and it might suffice to catch panics and close the connection. I may try that for simplicity, if solutions that allow returning a response are too complex.

@joshtriplett
Copy link
Collaborator Author

Nevermind, I see the problem now: building this as a handler leaves nothing to return as the Conn after catching a panic.

@joshtriplett
Copy link
Collaborator Author

I would welcome a suggestion for how to make panics not bring down the server, and to be able to finish any open tracing spans (so that they can be sent off for telemetry). It doesn't matter to me if that's a Handler or something else. I'm much less concerned about being able to successfully send an HTTP 500, though that'd be nice.

This is something that was straightforward to do with Tide.

@jbr
Copy link
Contributor

jbr commented Oct 14, 2023

Which runtime are you using? My understanding was that all of the current runtime adapters do what you're suggesting as default behavior; hanging up on the http client but continuing to process requests. If that's not the case, I'd consider that a bug. Sending a 500 would be the new behavior, and I've been convincing myself the drop-channel thing isn't terribly gross since it's a best effort that falls back to the current behavior if drop never runs

@joshtriplett
Copy link
Collaborator Author

@jbr ...well, that's fascinating. I had been looking for that functionality in trillium, and I had no idea that async runtimes did that themselves.

Given that, it sounds like I've already got the robustness I need, and since the server doesn't exit it'll be able to continue tracing, so I just need to trace panics myself. I don't know if it's worth the complexity just to send a 500.

Thank you for the explanation; TIL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants