Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework SLO metrics #2840

Merged
merged 14 commits into from
Aug 25, 2023
Merged

Rework SLO metrics #2840

merged 14 commits into from
Aug 25, 2023

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Aug 24, 2023

What this PR does:
This PR reworks the way our SLO metrics are recorded with these goals in mind;

  • Push the calculations out to be as close the http.Server as possible
  • Clearly metric both total queries and SLO queries in one spot
  • Cleanup and remove unused metrics

This PR accomplishes the above with the following sharp corners:

  • In order to pass the throughput up to the code that calculates the SLOs a *float was added to the context. This works, but is brittle and difficult to detect if it breaks.
  • Specialized code needed to be added to the streaming endpoint

TODO

  • correct changelog
  • add tests

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
@joe-elliott joe-elliott changed the title Remove status from total_queries Rework SLO metrics Aug 24, 2023
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. Just a question about tracking the other operations. Have you had a chance to test this out?

modules/frontend/slos.go Show resolved Hide resolved
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, accuracy of the metrics for SLO calculation is important and I think this is a good change. I think this is Tempo's first case of context smuggling so it gets a pass. 😄

Signed-off-by: Joe Elliott <[email protected]>
Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Joe. Thanks for giving this area some ❤️ .

Signed-off-by: Joe Elliott <[email protected]>
@joe-elliott joe-elliott merged commit f5c8e78 into grafana:main Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants