-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metric for count of RPC requests #15515
Conversation
853626a
to
d9738e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
d9738e6
to
7a2872d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
7a2872d
to
d96bcdd
Compare
nomad/rpc_rate_metrics.go
Outdated
if rpcCtx == nil { | ||
return // we're the RPC caller and not the server | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some local testing I'm seeing this comment isn't quite accurate because it hides RPCs that are coming in from HTTP to the host that's taking the metric; I need to debug this a bit more to verify but we might end up dropping this conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to only count RPCs where they're actually served and skip counting them when forwarded? It definitely changes how we can use these metrics if as forwarding could really skew metrics (1-2x for writes, 1-2x for non-stale reads, completely arbitrary amount due to federation). Relative comparisons either between metrics or a single-metric-over-time could even be skewed by a leadership election causing a 2x bump as a write or non-stale read to a particular server now always gets forwarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the intention from the RFC was to count whenever an RPC is served by a server or forwarded by a server (because that let's you detect problems like uneven distribution of RPCs from client nodes). If we only cared about metrics on the server that actually serves the request, we could dramatically simplify the problem because auth could happen after checking for forwarding. But I'm pretty sure we do want metrics pre-forwarding?
My intent with this conditional around rpcCtx == nil
was that we probably don't want to track "static endpoint" RPCs where ex. the deploymentwatcher running on the leader is using RPCs as normal function calls. Unfortunately that also means that if you send a HTTP request to a server, the server that serves the HTTP request won't count it because there's no rpcCtx
in that case (ex. if the HTTP request hits the leader).
That being said, I think the right approach here is to remove this conditional and just count those static endpoint calls. The identity will make it clear they're internal calls anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent with this conditional around rpcCtx == nil was that we probably don't want to track "static endpoint" RPCs where ex. the deploymentwatcher running on the leader is using RPCs as normal function calls.
Ah that makes sense. I'd love to stop using those static endpoints anyway. I can't imagine there's not a better way to share that logic internally. Perhaps we should file an issue around this (eg Remove static endpoints to avoid counting them toward metrics
or something) and ship this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the only consumers of the static endpoints now are deploymentwatcher and volumewatcher (following the work we did in #15451). But I bet we could extract some inner functions for the handful of methods we need and then inject those functions instead of the RPC handler. I'll open an issue for that.
Will need to rebase this on #15513 once that's been merged, as there's some changes that impact this one |
2afd240
to
81c7c1e
Compare
81c7c1e
to
107a766
Compare
107a766
to
34dd5c0
Compare
test failures include
|
|
Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through the all. Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place.
Had to rebase this on the merged #15513 and I've got a telemetry config test I need to fix, and that should wrap this up. |
b5379c7
to
a28b2e1
Compare
This changeset configures the RPC rate metrics that were added in #15515 to all the RPCs that support authenticated HTTP API requests. These endpoints already configured with pre-forwarding authentication in #15870, and a handful of others were done already as part of the proof-of-concept work. So this changeset is entirely copy-and-pasting one method call into a whole mess of handlers. Upcoming PRs will wire up pre-forwarding auth and rate metrics for the remaining set of RPCs that have no API consumers or aren't authenticated, in smaller chunks that can be more thoughtfully reviewed.
This changeset configures the RPC rate metrics that were added in #15515 to all the RPCs that support authenticated HTTP API requests. These endpoints already configured with pre-forwarding authentication in #15870, and a handful of others were done already as part of the proof-of-concept work. So this changeset is entirely copy-and-pasting one method call into a whole mess of handlers. Upcoming PRs will wire up pre-forwarding auth and rate metrics for the remaining set of RPCs that have no API consumers or aren't authenticated, in smaller chunks that can be more thoughtfully reviewed.
Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new
ACL.WhoAmI
RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through them all (#15513 plus follow-up work)Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place.