Add rules for cluster CPU-hours and Instance-hours #418

kahowell · 2022-07-28T20:48:08Z

These are intended to help simplify PromQL used by subscription watch,
as well as make it more visible to others.

Additionally, the encapsulation allows us to tweak the definition of
CPU-hours or instance-hours if better underlying metrics are made
available.

Note this is untested, but I'm happy to assist with testing, but I lack context/fixtures.

These are intended to help simplify PromQL used by subscription watch, as well as make it more visible to others. Additionally, the encapsulation allows us to tweak the definition of CPU-hours or instance-hours if better underlying metrics are made available.

openshift-ci · 2022-07-28T20:48:19Z

Hi @kahowell. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

simonpasquier · 2022-07-29T08:13:18Z

jsonnet/telemeter/rules.libsonnet

+              // max(...) by (_id) is used ensure a single datapoint per cluster ID
+              record: 'cluster:usage:workload:capacity_physical_cpu_hours',
+              expr: |||
+                max(sum_over_time(cluster:usage:workload:capacity_physical_cpu_cores:max:5m[1h:5m]) / count_over_time(vector(1)[1h:5m])) by (_id)


In the current form, the expression would return no data because the right-hand side has no _id label to match with the left-hand side (vector() returns a scalar as a vector with no label).
count_over_time(vector(1)[1h:5m]) is always going to return 12 anyway. Hence it could be replaced by 12 (if it's really what you wanted).

Ah, I was missing a scalar call. I PoC'd this change and then forgot to include scalar when I transcribed it.

As to the 12 thing, the reason I thought it might be wise to use scalar(count_over_time(vector(1)[1h:5m])) instead was that I noticed different answers for scalar(count_over_time(vector(1)[1h:5m])) depending on step and timestamp passed. For example:

Expand for screenshot

In practice, we noticed that when using step=3600 and aligned to the top of the hour, we always get a value of `13`:

Expand for screenshot

I figured this was something to do with Prometheus doing sampling (actually, I'd love a more specific/accurate explanation, if you have one). Thus it seemed unsafe to simply use 13, since, if I understand correctly the recording rule doesn't necessarily run at the top of the hour. If you believe 13 is fine (or 12), though, I am more than happy to hardcode.

hmm this looks like a Thanos artifact. With a vanilla Prometheus, I can't reproduce...

@simonpasquier given the above, how should we proceed?

hardcode 12?

hardcode 13?

use scalar(count_over_time(vector(1)[1h:5m]))?

scalar(count_over_time(vector(1)[1h:5m])) is probably going to return the "correct" result but I'd be eager to hear from @bwplotka if it's something that he's aware of.

@bwplotka following up on the question that went your August 2. Does this look ok to you?

jsonnet/telemeter/rules.libsonnet

barnabycourt · 2022-08-05T12:36:44Z

/assign @bwplotka per the comment earlier in the PR

openshift-ci · 2022-08-05T12:36:47Z

@barnabycourt: GitHub didn't allow me to assign the following users: in, PR, per, the, comment, earlier.

Note that only openshift members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @bwplotka per the comment earlier in the PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

accorvin · 2022-08-18T15:29:40Z

@anishasthana I think you have a lot of experience with the rules we're using like this in RHODS - can you take a look at this too and give your opinion on whether it could satisfy RHODS use cases from your perspective?

anishasthana · 2022-08-19T14:49:09Z

After talking to Jeff and Kevin, I think these rules would satisfy RHODS use cases.

bwplotka

LGTM!

jsonnet/telemeter/rules.libsonnet

moadz · 2022-09-09T12:00:10Z

LGTM

openshift-ci · 2022-09-09T12:00:13Z

@moadz: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Co-authored-by: Bartlomiej Plotka <[email protected]>

simonpasquier · 2022-09-12T09:06:28Z

/ok-to-test

barnabycourt · 2022-09-14T14:41:16Z

/retest

simonpasquier · 2022-09-15T13:14:10Z

/test e2e-aws-upgrade

openshift-ci · 2022-09-15T15:25:35Z

@kahowell: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

anishasthana · 2022-09-27T15:49:01Z

Looks like we still need a lgtm on the PR.

douglascamata · 2022-09-28T08:54:42Z

/lgtm

openshift-ci · 2022-09-28T08:54:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bwplotka, douglascamata, kahowell, moadz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [bwplotka]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from bwplotka and sthaha July 28, 2022 20:48

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 28, 2022

simonpasquier reviewed Jul 29, 2022

View reviewed changes

kahowell added 2 commits July 29, 2022 10:10

Add scalar function in cluster-hours and instance-hours divisors

7c1bcca

Simplify instance-hours rule per feedback

2a9e449

simonpasquier reviewed Jul 29, 2022

View reviewed changes

jsonnet/telemeter/rules.libsonnet Outdated Show resolved Hide resolved

Switch cpu-hours to use max by(_id)

3a5f0b3

openshift-ci bot assigned bwplotka Aug 5, 2022

bwplotka approved these changes Sep 9, 2022

View reviewed changes

jsonnet/telemeter/rules.libsonnet Outdated Show resolved Hide resolved

jsonnet/telemeter/rules.libsonnet Outdated Show resolved Hide resolved

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2022

moadz approved these changes Sep 9, 2022

View reviewed changes

Fix punctuation in CPU-hours, Instance-hours comments

9bb86a0

Co-authored-by: Bartlomiej Plotka <[email protected]>

openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 12, 2022

openshift-ci bot assigned douglascamata Sep 28, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2022

openshift-merge-robot merged commit 8e8125e into openshift:master Sep 28, 2022

douglascamata mentioned this pull request Sep 29, 2022

Update Telemeter rules rhobs/configuration#340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rules for cluster CPU-hours and Instance-hours #418

Add rules for cluster CPU-hours and Instance-hours #418

kahowell commented Jul 28, 2022

openshift-ci bot commented Jul 28, 2022

simonpasquier Jul 29, 2022

kahowell Jul 29, 2022

kahowell Jul 29, 2022

simonpasquier Jul 29, 2022

kahowell Aug 2, 2022

simonpasquier Aug 2, 2022

barnabycourt Aug 18, 2022

barnabycourt commented Aug 5, 2022

openshift-ci bot commented Aug 5, 2022

accorvin commented Aug 18, 2022

anishasthana commented Aug 19, 2022

bwplotka left a comment

moadz commented Sep 9, 2022 •

edited

Loading

openshift-ci bot commented Sep 9, 2022

simonpasquier commented Sep 12, 2022

barnabycourt commented Sep 14, 2022

simonpasquier commented Sep 15, 2022

openshift-ci bot commented Sep 15, 2022

anishasthana commented Sep 27, 2022

douglascamata commented Sep 28, 2022

openshift-ci bot commented Sep 28, 2022

Add rules for cluster CPU-hours and Instance-hours #418

Add rules for cluster CPU-hours and Instance-hours #418

Conversation

kahowell commented Jul 28, 2022

openshift-ci bot commented Jul 28, 2022

simonpasquier Jul 29, 2022

Choose a reason for hiding this comment

kahowell Jul 29, 2022

Choose a reason for hiding this comment

kahowell Jul 29, 2022

Choose a reason for hiding this comment

simonpasquier Jul 29, 2022

Choose a reason for hiding this comment

kahowell Aug 2, 2022

Choose a reason for hiding this comment

simonpasquier Aug 2, 2022

Choose a reason for hiding this comment

barnabycourt Aug 18, 2022

Choose a reason for hiding this comment

barnabycourt commented Aug 5, 2022

openshift-ci bot commented Aug 5, 2022

accorvin commented Aug 18, 2022

anishasthana commented Aug 19, 2022

bwplotka left a comment

Choose a reason for hiding this comment

moadz commented Sep 9, 2022 • edited Loading

openshift-ci bot commented Sep 9, 2022

simonpasquier commented Sep 12, 2022

barnabycourt commented Sep 14, 2022

simonpasquier commented Sep 15, 2022

openshift-ci bot commented Sep 15, 2022

anishasthana commented Sep 27, 2022

douglascamata commented Sep 28, 2022

openshift-ci bot commented Sep 28, 2022

moadz commented Sep 9, 2022 •

edited

Loading