-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc, xds: recovery middleware to return and log error in case of panic #10895
Conversation
1) xds and grpc servers: 1.1) to use recovery middleware with callback that prints stack trace to log 1.2) callback turn the panic into a core.Internal error 2) added unit test for grpc server
@dnephin, I am not sure why |
Hi @BigMikes,
I will spend some time to check what options are acceptable to solve this and add it in here. So you can adapt the PR to it, if you are still interested in working on it. Thank you again and sorry for the long delay. |
Hi @dhiaayachi, thanks for letting me know. No worries, let me know once you have the new instructions and I'll fix my PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I think this is looking good. I left a couple suggestions below for the test and sharing implementation with the two servers.
We also noticed that v2
of this middleware library is still a pre-release (rc2), so may not be production ready yet. I think we should either use v1
, or write our own interceptor. I think we'd be fine with either.
agent/xds/server.go
Outdated
recoveryOpts := []recovery.Option{ | ||
recovery.WithRecoveryHandlerContext(newPanicHandler(s.Logger)), | ||
} | ||
|
||
opts := []grpc.ServerOption{ | ||
grpc.MaxConcurrentStreams(2048), | ||
middleware.WithUnaryServerChain( | ||
// Add middlware interceptors to recover in case of panics. | ||
recovery.UnaryServerInterceptor(recoveryOpts...), | ||
), | ||
middleware.WithStreamServerChain( | ||
// Add middlware interceptors to recover in case of panics. | ||
recovery.StreamServerInterceptor(recoveryOpts...), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could also expose a function from agent/grpc
that returns these two options, and use that here? The handler in agent/grpc
could use the same PanicHandlerMiddleware
function to add them to opts, that way its clear that both of these gRPC servers are using the same middleware.
opts := []grpc.ServerOption{grpc.MaxConcurrentStreams(2048)}
opts = append(opts, agentgrpc.PanicHandlerMiddleware(s.Logger)...)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that agent/grpc/handler.go
also requires another interceptor, which is not used in xds' server:
middleware.WithStreamServerChain(
// Add middlware interceptors to recover in case of panics.
recovery.StreamServerInterceptor(recoveryOpts...),
(&activeStreamCounter{metrics: metrics}).Intercept,
),
And interceptions cannot be specified twice, otherwise you get the following panic:
panic: The stream server interceptor was already set and may not be reset. [recovered]
panic: The stream server interceptor was already set and may not be reset.
Also, I think it might be good to have the middleware in close as possible where the server is defined, so that people can clearly see what interceptors are being used and they can easily extend the server with new interceptors.
Hi @dhiaayachi, @dnephin. FYI, this PR is ready for review based on your latest comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the merge conflict. If CI is still happy I will merge.
Thanks again!
@@ -214,6 +214,7 @@ github.com/gophercloud/gophercloud v0.1.0 h1:P/nh25+rzXouhytV2pUHBb65fnds26Ghl8/ | |||
github.com/gophercloud/gophercloud v0.1.0/go.mod h1:vxM41WHh5uqHVBMZHzuwNOHh8XEoIEcSTewFxm1c5g8= | |||
github.com/gorilla/websocket v1.4.0/go.mod h1:E7qHFY5m1UJ88s3WnNqhKjPHQ0heANvMoAMk2YaljkQ= | |||
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7/go.mod h1:FecbI9+v66THATjSRHfNgh1IVFe/9kFxbXtjV0ctIMA= | |||
github.com/grpc-ecosystem/go-grpc-middleware v1.0.0 h1:Iju5GlWwrvL6UBg4zJJt3btmonfrMlCDdsejg4CZE7c= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looking into using a newer 1.x version, but it requires updating gRPC , which I don't particularly want to do right now. I looked at the diff, and nothing important has changed since 1.0, so I think this is good.
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/520564. |
1.1. to use recovery middleware with callback that prints stack trace to log
1.2. callback turn the panic into a
core.Internal
errorPartially fixes 2nd and 3rd tasks in list: #10715.