Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VA receives PerformValidation RPCs very late #3641

Closed
jsha opened this issue Apr 11, 2018 · 0 comments
Closed

VA receives PerformValidation RPCs very late #3641

jsha opened this issue Apr 11, 2018 · 0 comments
Assignees

Comments

@jsha
Copy link
Contributor

jsha commented Apr 11, 2018

During periods of peak traffic, the VA sometimes receives a PerformValidation RPC long after the corresponding POST to the challenge to request validation. After some analysis, it looks like this is caused by blocking when we hit a MaxConcurrentStreams limit in Go's HTTP/2 stack (which is used by gRPC). See grpc/grpc-go#1986 for some more details. The default value is 250, and during the spikes that cause these delayed validations, we see traffic of about 550 rps, which should be enough to cause delays.

I believe this problem manifests in all of our components during peak load, but it is particularly noticeable in the VA, where many RPCs take a long time, and so use up a slot for longer.

This also explains why, in the past, we've seen slow DNS cause timeouts in the IsSafeDomain RPC, even though that RPC almost never hits the network, and when it does, it uses a different resolver than the PerformValidation RPC. I think what was happening in those cases was that the slow PerformValidation RPCs were using up all the slots for the VA service, so some fraction of all RPCs to the VA timed out.

@jsha jsha added this to the Sprint 2018-04-10 milestone Apr 11, 2018
@jsha jsha self-assigned this Apr 11, 2018
@cpu cpu closed this as completed in #3642 Apr 12, 2018
cpu pushed a commit that referenced this issue Apr 12, 2018
During periods of peak load, some RPCs are significantly delayed (on the order of seconds) by client-side blocking. HTTP/2 clients have to obey a "max concurrent streams" setting sent by the server. In Go's HTTP/2 implementation, this value [defaults to 250](https://github.com/golang/net/blob/master/http2/server.go#L56), so the gRPC default is also 250. So whenever there are more than 250 requests in progress at a time, additional requests will be delayed until there is a slot available.

During this peak load, we aren't hitting limits on CPU or memory, so we should increase the max concurrent streams limit to take better advantage of our available resources. This PR adds a config field to do that.

Fixes #3641.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant