-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add more labels to RequestRecorder #12727
Conversation
Co-authored-by: Daniel Nephin <[email protected]> Signed-off-by: FFMMM <[email protected]>
Signed-off-by: FFMMM <[email protected]>
Signed-off-by: FFMMM <[email protected]>
labelsArr := flattenLabels(labels) | ||
r.Logger.Trace(requestLogName, labelsArr...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did my best to not add another helper func here -- let me know if there's a easier method to enumerate keys, values as []interface{}
in go 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few non-blocking suggestions.
@@ -1344,11 +1347,13 @@ func TestServer_RPC_MetricsIntercept(t *testing.T) { | |||
{Name: "errored", Value: "false"}, | |||
{Name: "request_type", Value: "read"}, | |||
{Name: "rpc_type", Value: "test"}, | |||
{Name: "server_role", Value: "unreported"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this get wired up with the test's (*Server).IsLeader
func and thus not be "unreported"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, that's a fair point; the way i was thinking is that this kind of test needs to happen at the middleware
layer 🤔
also agent/metrics_test.go
should take care of the server_role
/ e2e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @FFMMM. I have few comments related to the use of IsLeader mostly.
@@ -425,6 +404,29 @@ func NewServer(config *Config, flat Deps, publicGRPCServer *grpc.Server) (*Serve | |||
fsm: newFSMFromConfig(flat.Logger, gc, config), | |||
} | |||
|
|||
var recorder *middleware.RequestRecorder | |||
if flat.NewRequestRecorderFunc != nil { | |||
recorder = flat.NewRequestRecorderFunc(serverLogger, s.IsLeader, s.config.Datacenter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what are the implication of passing IsLeader
to the request recorder in here. IsLeader go as deep as atomically reading the raft state to be able to tell and that could have sides effects that are hard to reason about (specially performance effects)
I would recommend to have a separate state for the recorder that we can update when we acquire or loose leadership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to have a separate state for the recorder that we can update when we acquire or loose leadership.
Would you mind chatting thru this some more with me? When I thought about this, I started to think that it can get too complex keeping tabs on the leadership state for a server but I may be wrong here 🤔
On another note, I see this "pattern" (of passing the func/ a smaller interface) being loosely used in gateway
code here:
consul/agent/consul/gateway_locator.go
Lines 272 to 277 in 09d61e6
type serverDelegate interface { | |
blockingQuery(queryOpts blockingQueryOptions, queryMeta blockingQueryResponseMeta, fn queryFn) error | |
IsLeader() bool | |
LeaderLastContact() time.Time | |
setDatacenterSupportsFederationStates() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were chatting offline and believe that the underlying raft.getState()
per RPC call would not adversely impact performance.
We are still interested in coming up with a different way of the server telling the request recorder about its leadership state. I will backlog an issue for this.
} | ||
|
||
key := keyMakingFunc(middleware.OneTwelveRPCSummary[0].Name, expectedLabels) | ||
|
||
if _, ok := storage[key]; !ok { | ||
// the compound key will look like: "rpc+server+call+Status.Ping+false+read+test+unreported" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can simplify the content of the key and use assertion to verify the values in here. I think in the test you can most of the time infer the right entry without having all of this in the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the name of the rpc call (otherwise the key gets overwritten in the map).
The labels help distinguish the happy path for us so we don't "fail silently" on the calls and mark the test as passed.
agent/rpc/middleware/interceptors.go
Outdated
|
||
labels := []metrics.Label{ | ||
{Name: "method", Value: requestName}, | ||
{Name: "errored", Value: strconv.FormatBool(respErrored)}, | ||
{Name: "request_type", Value: reqType}, | ||
{Name: "rpc_type", Value: rpcType}, | ||
{Name: "server_role", Value: serverRole}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/bikeshed:
{Name: "server_role", Value: serverRole}, | |
{Name: "leader", Value: strconv.FormatBool(isLeader)}, |
RPC metrics will only be emitted by servers so there is no need to prefix the metric name with server
. Secondly since there are only two possible values of leader and follower I think the metric might be more intuitive for someone exploring their grafana or prometheus to have the name be leader
and the value be a boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes a lot of sense! and that's what I am changing for a new commit ⌨️
I think it would be nice to keep unreported
, and subsequently keep this as a tri state
leader: {`true`, `false`, `unreported`}
for a defense in depth kind of approach. If the function is not passed to the RequestRecorder
, I wouldn't think it useful or accurate to report leader: false
Signed-off-by: FFMMM <[email protected]>
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/635231. |
🍒✅ Cherry pick of commit a46bbe8 onto |
Co-authored-by: Daniel Nephin <[email protected]> Signed-off-by: FFMMM <[email protected]>
overview
This PR adds more labels to the RequestRecorder. The following are mandatory labels:
leader
={true | false | unreported}
The optional labels, given duck typing, are:
blocking
={true | false}
target_datacenter
={string}
locality
={forwarded | local
}These labels allow further observability for the new rpc metric(s).
todos left: