-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only do telemetry computation if telemetry is enabled in the node #10245
Comments
ACK, yeah this totally makes sense and TBH was an oversight on my part. I didn't realize how much time these calls took. |
To improve the time.Now() call, we can make that a telemetry function. So replace |
Does anyone have any guidance on how we can get there to be a flag for whether or not telemetry is enabled? Some thoughts that come to mind for me:
I feel like putting a |
@ValarDragon the SDK context type? I think that might be our only option. Alternatively, we have a global variable in the telemetry package. |
yeah meant SDK context type. I feel like a global variable in telemetry will make the syntax for doing telemetry calls simpler at least, no idea if it would cause problems for integrators though. cc @jackzampolin e.g. API with global variable
API choices with it in context:
or
I don't see a reason why if your running N tendermint chains, you'd only want telemetry on a subset. Is that an important thing to support? |
Asked @jackzampolin over DM, he saw no client issue with adding a global variable to the telemetry package, thats set on telemetry initialization. So lets go with that approach! (It has the easier API) |
can we do what Tendermint does and pass a noop telemetry? or this is about also avoiding time calls? |
Also avoiding the |
Yes
Yes |
@marbar3778 @alexanderbez, can you provide a list of actionable items here? Happy to work on this
What do you mean by higher level? server? |
I'm not really sure what @marbar3778 is referring to, but you can't really avoid defer timing calls. The idea is that we have a config in the config struct and delegate all telemetry calls to a function that checks that config. Something like: // in the telemetry package
func Emit(ctx sdk.Context, metricCb func()) {
if ctx.TelemetryEnabled() {
metricCb()
}
}
// some function in keeper
func (k Keeper) Foo(ctx sdk.Contex, ...) {
defer telemetry.Emit(ctx, func() {
telemetry.MeasureSince(time.Now(), "foo", "bar")
})
} |
it means replacing the |
I think my proposal achieves that whilst also being flexible to other things besides time measurements (e.g. counters) |
MeasureSInce and associated functions should call time.Now() instead of leaving it to the keeper. This way the noOP metric gatherer can be set without time.now or with. |
What's the status of this? It sounds like an API change, right? If so, can we agree on the proposed API? |
this is still relevant, i dont have a particular api in mind, do you have any thoughts? |
This design seems compelling, but has the issue that the lazy
What mutex locks are you referring to? Note that
is racy. It's otherwise a reasonable design if |
Good point. We could augment the |
Why not check and load the metric at the same time? Say,
becomes
? It's an extra if-check that can side-step all extra work including |
is there a way to make so this is handled automatically? the if checks will become redundant and annoying to write quickly |
@elias-orijtech I don't think it should be the caller's responsibility -- poor UX IMO |
What's the alternative? Can you expand on #10245 (comment)? |
Well with my proposal, the APIs wouldn't be that much cleaner anyway. I think we might have to have e.g. // in the telemetry package
func EmitMeasureSince(ctx sdk.Context, t time.Time, args ...string) {
if ctx.TelemetryEnabled() {
telemetry.MeasureSince(t, args...)
}
}
// some function in keeper
func (k Keeper) Foo(ctx sdk.Contex, ...) {
t := time.Now()
defer telemetry.EmitMeasureSince(ctx, t)
} |
This design doesn't omit the call to I still think my suggestion is simpler overall, although I admit it is somewhat clumsier.
|
can time.Now() be put into EmitMeasureSince? |
EmitMeasureSince is deferred, so its |
|
Yes, that's the clumsiness of my proposal. However, the idiom is almost akin to
No. The arguments to a deferred function call are evaluated eagerly. |
Yeah I just don't see how that's a cleaner API and dev UX personally. |
Alright. So what's the final API? #10245 (comment) plus |
I think so? Unless you can think of a clean way w/o needing the caller to check the enablement? |
Hi guys, I've been thinking about this and I come with two different proposals. One using Using sdk.ContextAdvantages:
Disadvantages:
Using a Global VariableAdvantages:
Disadvantages:
From my point of view, the best approach is the global variable. I don't see a need for having flexibility in telemetry configuration, generally, it's something that is either activated or not at the start of the application until the end of its lifetime. For this, I've created two PRs (for demonstration purposes) showcasing the two approaches, so we can decide together which path to take |
This is a hard one. Globals are bad but sdk.context is something we want to remove. The global here doesn't seem so bad. |
I know it, and I don't see many alternatives. Given that it's a global configuration constant, it feels like the lesser of the evils. For now, I'm opting for the global variable approach. If you come up with any other ideas, please let me know. |
Summary
As detailed in other issues, the mutex locks and time.Now() syscalls' taken in telemetry are rather expensive for nodes. In general, nodes not using telemetry should not pay these costs.
Problem Definition
Nodes that aren't using telemetry should not pay the costs of telemetry.
Proposal
Somehow make the telemetry package know the result of its config option detailing whether or not its enabled. Then only do operations when its enabled. This is useful due to potential for mutex lock contention.
This doesn't solve the extraneous time.Now() call, but that should hopefully be solvable via other mechanisms. (e.g. not putting telemetry into hot loops / low level items) This can't be deadcode eliminated / constant folded, since telemetry enablement is a run-time flag, not a compile time flag
For Admin Use
The text was updated successfully, but these errors were encountered: