grpc, xds: recovery middleware to return and log error in case of panic #10895

gmichelo · 2021-08-22T18:17:24Z

xds and grpc servers:
1.1. to use recovery middleware with callback that prints stack trace to log
1.2. callback turn the panic into a core.Internal error
added unit test for grpc server

Partially fixes 2nd and 3rd tasks in list: #10715.

1) xds and grpc servers: 1.1) to use recovery middleware with callback that prints stack trace to log 1.2) callback turn the panic into a core.Internal error 2) added unit test for grpc server

gmichelo · 2021-08-23T21:16:22Z

@dnephin, I am not sure why check-go-mod check fails. I committed the new go.mod and go.sum. Is there any issue with that?

hashicorp-cla · 2021-09-09T10:03:02Z

All committers have signed the CLA.

dhiaayachi · 2021-09-09T13:35:22Z

Hi @BigMikes,
Thank you for working on this PR. We had a discussion with the team about this and the following came out of it:

We agree on the principal of having a way to recover from a panic
We are no sure that adding an extra dependency to consul middleware is justified for this.

I will spend some time to check what options are acceptable to solve this and add it in here. So you can adapt the PR to it, if you are still interested in working on it.

Thank you again and sorry for the long delay.

gmichelo · 2021-09-09T18:44:42Z

Hi @dhiaayachi,

thanks for letting me know. No worries, let me know once you have the new instructions and I'll fix my PR.

dnephin

Thank you for working on this! I think this is looking good. I left a couple suggestions below for the test and sharing implementation with the two servers.

We also noticed that v2 of this middleware library is still a pre-release (rc2), so may not be production ready yet. I think we should either use v1, or write our own interceptor. I think we'd be fine with either.

agent/grpc/client_test.go

agent/grpc/handler.go

agent/xds/server.go

dnephin · 2021-10-01T17:16:31Z

agent/xds/server.go

+	recoveryOpts := []recovery.Option{
+		recovery.WithRecoveryHandlerContext(newPanicHandler(s.Logger)),
+	}
+
 	opts := []grpc.ServerOption{
 		grpc.MaxConcurrentStreams(2048),
+		middleware.WithUnaryServerChain(
+			// Add middlware interceptors to recover in case of panics.
+			recovery.UnaryServerInterceptor(recoveryOpts...),
+		),
+		middleware.WithStreamServerChain(
+			// Add middlware interceptors to recover in case of panics.
+			recovery.StreamServerInterceptor(recoveryOpts...),
+		),


I guess we could also expose a function from agent/grpc that returns these two options, and use that here? The handler in agent/grpc could use the same PanicHandlerMiddleware function to add them to opts, that way its clear that both of these gRPC servers are using the same middleware.

opts := []grpc.ServerOption{grpc.MaxConcurrentStreams(2048)} opts = append(opts, agentgrpc.PanicHandlerMiddleware(s.Logger)...)

What do you think?

The problem is that agent/grpc/handler.go also requires another interceptor, which is not used in xds' server:

middleware.WithStreamServerChain( // Add middlware interceptors to recover in case of panics. recovery.StreamServerInterceptor(recoveryOpts...), (&activeStreamCounter{metrics: metrics}).Intercept, ),

And interceptions cannot be specified twice, otherwise you get the following panic:

panic: The stream server interceptor was already set and may not be reset. [recovered] panic: The stream server interceptor was already set and may not be reset.

Also, I think it might be good to have the middleware in close as possible where the server is defined, so that people can clearly see what interceptors are being used and they can easily extend the server with new interceptors.

…corp-main

gmichelo · 2021-11-06T15:34:36Z

Hi @dhiaayachi, @dnephin. FYI, this PR is ready for review based on your latest comments.

dhiaayachi

Hey @BigMikes , sorry for the delay. This is good to go from my perspective.
I will let @dnephin take another look before merging.

dnephin

I fixed the merge conflict. If CI is still happy I will merge.

Thanks again!

dnephin · 2021-12-07T23:20:50Z

go.sum

@@ -214,6 +214,7 @@ github.com/gophercloud/gophercloud v0.1.0 h1:P/nh25+rzXouhytV2pUHBb65fnds26Ghl8/
 github.com/gophercloud/gophercloud v0.1.0/go.mod h1:vxM41WHh5uqHVBMZHzuwNOHh8XEoIEcSTewFxm1c5g8=
 github.com/gorilla/websocket v1.4.0/go.mod h1:E7qHFY5m1UJ88s3WnNqhKjPHQ0heANvMoAMk2YaljkQ=
 github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7/go.mod h1:FecbI9+v66THATjSRHfNgh1IVFe/9kFxbXtjV0ctIMA=
+github.com/grpc-ecosystem/go-grpc-middleware v1.0.0 h1:Iju5GlWwrvL6UBg4zJJt3btmonfrMlCDdsejg4CZE7c=


I looking into using a newer 1.x version, but it requires updating gRPC , which I don't particularly want to do right now. I looked at the diff, and nothing important has changed since 1.0, so I think this is good.

hc-github-team-consul-core · 2021-12-07T23:35:43Z

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/520564.

gmichelo added 2 commits August 7, 2021 13:21

grpc Server: turn panic into error through middleware

2b14a9b

grpc, xds: recovery middleware to return and log error in case of panic

4b0eaa4

1) xds and grpc servers: 1.1) to use recovery middleware with callback that prints stack trace to log 1.2) callback turn the panic into a core.Internal error 2) added unit test for grpc server

vercel bot temporarily deployed to Preview – consul August 22, 2021 18:17 Inactive

Added changelog for grpc and xds servers panic recovery.

1180557

vercel bot temporarily deployed to Preview – consul August 22, 2021 18:22 Inactive

Merge branch 'main' into serve-panic-recovery

655da1f

vercel bot deployed to Preview – consul August 22, 2021 18:31 View deployment

Fix merge conflicts

7fa0110

vercel bot temporarily deployed to Preview – consul August 22, 2021 18:35 Inactive

Fix go.sum with go mod tidy

cbf437e

vercel bot temporarily deployed to Preview – consul August 22, 2021 18:50 Inactive

dhiaayachi self-assigned this Aug 24, 2021

dnephin assigned dnephin and unassigned dhiaayachi Oct 1, 2021

dnephin added the type/enhancement Proposed improvement or new feature label Oct 1, 2021

dnephin reviewed Oct 1, 2021

View reviewed changes

dnephin added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Oct 5, 2021

gmichelo added 2 commits October 16, 2021 16:59

Merge branch 'main' of https://github.com/hashicorp/consul into hashi…

0c78dda

…corp-main

Merge branch 'hashicorp-main' into serve-panic-recovery

ab75b4e

vercel bot deployed to Preview – consul October 16, 2021 16:02 View deployment

Separete test file and no stack trace in ret error

fecce25

vercel bot temporarily deployed to Preview – consul October 16, 2021 17:02 Inactive

Restored comment.

a3fb665

vercel bot temporarily deployed to Preview – consul October 16, 2021 17:05 Inactive

Merge branch 'main' into serve-panic-recovery

af7b7b5

vercel bot deployed to Preview – consul November 6, 2021 15:12 View deployment

github-actions bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 6, 2021

vercel bot deployed to Preview – consul-ui-staging November 30, 2021 21:25 View deployment

dhiaayachi approved these changes Nov 30, 2021

View reviewed changes

Merge remote-tracking branch 'origin/main' into serve-panic-recovery

dccd3f5

vercel bot deployed to Preview – consul December 7, 2021 23:19 View deployment

dnephin approved these changes Dec 7, 2021

View reviewed changes

dnephin mentioned this pull request Dec 7, 2021

Gracefully handle panics in all request handlers #10715

Open

5 tasks

dnephin merged commit d9dd694 into hashicorp:main Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpc, xds: recovery middleware to return and log error in case of panic #10895

grpc, xds: recovery middleware to return and log error in case of panic #10895

gmichelo commented Aug 22, 2021

gmichelo commented Aug 23, 2021

hashicorp-cla commented Sep 9, 2021 •

edited

Loading

dhiaayachi commented Sep 9, 2021

gmichelo commented Sep 9, 2021

dnephin left a comment •

edited

Loading

dnephin Oct 1, 2021

gmichelo Oct 16, 2021

gmichelo commented Nov 6, 2021

dhiaayachi left a comment

dnephin left a comment

dnephin Dec 7, 2021

hc-github-team-consul-core commented Dec 7, 2021

grpc, xds: recovery middleware to return and log error in case of panic #10895

grpc, xds: recovery middleware to return and log error in case of panic #10895

Conversation

gmichelo commented Aug 22, 2021

gmichelo commented Aug 23, 2021

hashicorp-cla commented Sep 9, 2021 • edited Loading

dhiaayachi commented Sep 9, 2021

gmichelo commented Sep 9, 2021

dnephin left a comment • edited Loading

Choose a reason for hiding this comment

dnephin Oct 1, 2021

Choose a reason for hiding this comment

gmichelo Oct 16, 2021

Choose a reason for hiding this comment

gmichelo commented Nov 6, 2021

dhiaayachi left a comment

Choose a reason for hiding this comment

dnephin left a comment

Choose a reason for hiding this comment

dnephin Dec 7, 2021

Choose a reason for hiding this comment

hc-github-team-consul-core commented Dec 7, 2021

hashicorp-cla commented Sep 9, 2021 •

edited

Loading

dnephin left a comment •

edited

Loading