Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc, xds: recovery middleware to return and log error in case of panic #10895

Merged
merged 12 commits into from
Dec 7, 2021

Conversation

gmichelo
Copy link
Contributor

  1. xds and grpc servers:
    1.1. to use recovery middleware with callback that prints stack trace to log
    1.2. callback turn the panic into a core.Internal error
  2. added unit test for grpc server

Partially fixes 2nd and 3rd tasks in list: #10715.

1) xds and grpc servers:
   1.1) to use recovery middleware with callback that prints stack trace to log
   1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
@vercel vercel bot temporarily deployed to Preview – consul August 22, 2021 18:17 Inactive
@vercel vercel bot temporarily deployed to Preview – consul August 22, 2021 18:22 Inactive
@vercel vercel bot temporarily deployed to Preview – consul August 22, 2021 18:35 Inactive
@vercel vercel bot temporarily deployed to Preview – consul August 22, 2021 18:50 Inactive
@gmichelo
Copy link
Contributor Author

@dnephin, I am not sure why check-go-mod check fails. I committed the new go.mod and go.sum. Is there any issue with that?

@dhiaayachi dhiaayachi self-assigned this Aug 24, 2021
@hashicorp-cla
Copy link

hashicorp-cla commented Sep 9, 2021

CLA assistant check
All committers have signed the CLA.

@dhiaayachi
Copy link
Collaborator

Hi @BigMikes,
Thank you for working on this PR. We had a discussion with the team about this and the following came out of it:

  • We agree on the principal of having a way to recover from a panic
  • We are no sure that adding an extra dependency to consul middleware is justified for this.

I will spend some time to check what options are acceptable to solve this and add it in here. So you can adapt the PR to it, if you are still interested in working on it.

Thank you again and sorry for the long delay.

@gmichelo
Copy link
Contributor Author

gmichelo commented Sep 9, 2021

Hi @dhiaayachi,

thanks for letting me know. No worries, let me know once you have the new instructions and I'll fix my PR.

@dnephin dnephin assigned dnephin and unassigned dhiaayachi Oct 1, 2021
@dnephin dnephin added the type/enhancement Proposed improvement or new feature label Oct 1, 2021
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I think this is looking good. I left a couple suggestions below for the test and sharing implementation with the two servers.

We also noticed that v2 of this middleware library is still a pre-release (rc2), so may not be production ready yet. I think we should either use v1, or write our own interceptor. I think we'd be fine with either.

agent/grpc/client_test.go Outdated Show resolved Hide resolved
agent/grpc/handler.go Outdated Show resolved Hide resolved
agent/xds/server.go Outdated Show resolved Hide resolved
Comment on lines 566 to 579
recoveryOpts := []recovery.Option{
recovery.WithRecoveryHandlerContext(newPanicHandler(s.Logger)),
}

opts := []grpc.ServerOption{
grpc.MaxConcurrentStreams(2048),
middleware.WithUnaryServerChain(
// Add middlware interceptors to recover in case of panics.
recovery.UnaryServerInterceptor(recoveryOpts...),
),
middleware.WithStreamServerChain(
// Add middlware interceptors to recover in case of panics.
recovery.StreamServerInterceptor(recoveryOpts...),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could also expose a function from agent/grpc that returns these two options, and use that here? The handler in agent/grpc could use the same PanicHandlerMiddleware function to add them to opts, that way its clear that both of these gRPC servers are using the same middleware.

opts :=  []grpc.ServerOption{grpc.MaxConcurrentStreams(2048)}
opts = append(opts, agentgrpc.PanicHandlerMiddleware(s.Logger)...)

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that agent/grpc/handler.go also requires another interceptor, which is not used in xds' server:

middleware.WithStreamServerChain(
	// Add middlware interceptors to recover in case of panics.
	recovery.StreamServerInterceptor(recoveryOpts...),
	(&activeStreamCounter{metrics: metrics}).Intercept,
),

And interceptions cannot be specified twice, otherwise you get the following panic:

panic: The stream server interceptor was already set and may not be reset. [recovered]
	panic: The stream server interceptor was already set and may not be reset.

Also, I think it might be good to have the middleware in close as possible where the server is defined, so that people can clearly see what interceptors are being used and they can easily extend the server with new interceptors.

@dnephin dnephin added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Oct 5, 2021
@vercel vercel bot temporarily deployed to Preview – consul October 16, 2021 17:02 Inactive
@vercel vercel bot temporarily deployed to Preview – consul October 16, 2021 17:05 Inactive
@gmichelo
Copy link
Contributor Author

gmichelo commented Nov 6, 2021

Hi @dhiaayachi, @dnephin. FYI, this PR is ready for review based on your latest comments.

@github-actions github-actions bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 6, 2021
Copy link
Collaborator

@dhiaayachi dhiaayachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @BigMikes , sorry for the delay. This is good to go from my perspective.
I will let @dnephin take another look before merging.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the merge conflict. If CI is still happy I will merge.

Thanks again!

@@ -214,6 +214,7 @@ github.com/gophercloud/gophercloud v0.1.0 h1:P/nh25+rzXouhytV2pUHBb65fnds26Ghl8/
github.com/gophercloud/gophercloud v0.1.0/go.mod h1:vxM41WHh5uqHVBMZHzuwNOHh8XEoIEcSTewFxm1c5g8=
github.com/gorilla/websocket v1.4.0/go.mod h1:E7qHFY5m1UJ88s3WnNqhKjPHQ0heANvMoAMk2YaljkQ=
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7/go.mod h1:FecbI9+v66THATjSRHfNgh1IVFe/9kFxbXtjV0ctIMA=
github.com/grpc-ecosystem/go-grpc-middleware v1.0.0 h1:Iju5GlWwrvL6UBg4zJJt3btmonfrMlCDdsejg4CZE7c=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looking into using a newer 1.x version, but it requires updating gRPC , which I don't particularly want to do right now. I looked at the diff, and nothing important has changed since 1.0, so I think this is good.

@dnephin dnephin merged commit d9dd694 into hashicorp:main Dec 7, 2021
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/520564.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants