-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rules for cluster CPU-hours and Instance-hours #418
Add rules for cluster CPU-hours and Instance-hours #418
Conversation
These are intended to help simplify PromQL used by subscription watch, as well as make it more visible to others. Additionally, the encapsulation allows us to tweak the definition of CPU-hours or instance-hours if better underlying metrics are made available.
Hi @kahowell. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
jsonnet/telemeter/rules.libsonnet
Outdated
// max(...) by (_id) is used ensure a single datapoint per cluster ID | ||
record: 'cluster:usage:workload:capacity_physical_cpu_hours', | ||
expr: ||| | ||
max(sum_over_time(cluster:usage:workload:capacity_physical_cpu_cores:max:5m[1h:5m]) / count_over_time(vector(1)[1h:5m])) by (_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current form, the expression would return no data because the right-hand side has no _id
label to match with the left-hand side (vector() returns a scalar as a vector with no label).
count_over_time(vector(1)[1h:5m])
is always going to return 12 anyway. Hence it could be replaced by 12
(if it's really what you wanted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was missing a scalar
call. I PoC'd this change and then forgot to include scalar
when I transcribed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to the 12 thing, the reason I thought it might be wise to use scalar(count_over_time(vector(1)[1h:5m]))
instead was that I noticed different answers for scalar(count_over_time(vector(1)[1h:5m]))
depending on step and timestamp passed. For example:
I figured this was something to do with Prometheus doing sampling (actually, I'd love a more specific/accurate explanation, if you have one). Thus it seemed unsafe to simply use 13, since, if I understand correctly the recording rule doesn't necessarily run at the top of the hour. If you believe 13 is fine (or 12), though, I am more than happy to hardcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonpasquier given the above, how should we proceed?
- hardcode
12
? - hardcode
13
? - use
scalar(count_over_time(vector(1)[1h:5m]))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scalar(count_over_time(vector(1)[1h:5m]))
is probably going to return the "correct" result but I'd be eager to hear from @bwplotka if it's something that he's aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwplotka following up on the question that went your August 2. Does this look ok to you?
/assign @bwplotka per the comment earlier in the PR |
@barnabycourt: GitHub didn't allow me to assign the following users: in, PR, per, the, comment, earlier. Note that only openshift members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@anishasthana I think you have a lot of experience with the rules we're using like this in RHODS - can you take a look at this too and give your opinion on whether it could satisfy RHODS use cases from your perspective? |
After talking to Jeff and Kevin, I think these rules would satisfy RHODS use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
LGTM |
@moadz: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Co-authored-by: Bartlomiej Plotka <[email protected]>
/ok-to-test |
/retest |
/test e2e-aws-upgrade |
@kahowell: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Looks like we still need a lgtm on the PR. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bwplotka, douglascamata, kahowell, moadz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
These are intended to help simplify PromQL used by subscription watch,
as well as make it more visible to others.
Additionally, the encapsulation allows us to tweak the definition of
CPU-hours or instance-hours if better underlying metrics are made
available.
Note this is untested, but I'm happy to assist with testing, but I lack context/fixtures.