Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alert for rhel6 running vms #486

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

fossedihelm
Copy link
Contributor

@fossedihelm fossedihelm commented Jan 20, 2023

What this PR does / why we need it:
RHEL6 guests will be not supported on RHEL9 hosts anymore [1].
And CNV 4.13 is planned to be on RHCOS 9, so RHEL6 cannot be supported in CNV 4.13 anymore.
Adding a vm controller that watch for rhel 6 vms.
Rhel6 template adds a label in the vm in the format: rhel6-<workload>-<flavor>.
This label is checked to determine whether a vm is rhel6 or not.

[1] https://access.redhat.com/articles/973163#rhelkvm
Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Release note:

RHEL6 guests will be not supported on RHEL9 hosts anymore. Add alert for rhel6 running vms

/cc @akrejcir

@kubevirt-bot kubevirt-bot requested a review from akrejcir January 20, 2023 09:24
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Jan 20, 2023
@fossedihelm fossedihelm force-pushed the warning_rhel6 branch 2 times, most recently from 4bb299e to d9748c2 Compare January 20, 2023 10:42
Copy link
Collaborator

@akrejcir akrejcir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have only small comments.

controllers/vm_controller.go Outdated Show resolved Hide resolved
controllers/vm_controller.go Show resolved Hide resolved
tests/monitoring_test.go Outdated Show resolved Hide resolved
tests/monitoring_test.go Outdated Show resolved Hide resolved
@fossedihelm
Copy link
Contributor Author

@akrejcir I saw that a metric_test is failing since there's still no runbook for the new alert.

@akrejcir
Copy link
Collaborator

@akrejcir I saw that a metric_test is failing since there's still no runbook for the new alert.

Can you add code to the failing test, that will skip this specific alert? Then later we can remove it, when the runbook exists.

@fossedihelm
Copy link
Contributor Author

Thanks @akrejcir for your review! I pushed the changes to address your comments. PTAL! Thanks

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need a new controller for this or could this be implemented as an operand like we already do for other parts of ssp?

@akrejcir
Copy link
Collaborator

Do we actually need a new controller for this or could this be implemented as an operand like we already do for other parts of ssp?

We discussed this before, and I suggested creating a controller. The reason is that the operands in ssp-controller are only active if the SSP object is present in the cluster, and they can only watch objects they own. In the case of this PR, the VMs are not owned by ssp, so operand would not work.

Also, it would have worse performance, because on every update of a VM, the whole ssp-controller logic would run.

Expr: intstr.FromString("sum by (namespace, name) (kubevirt_vm_rhel6) > 0"),
Annotations: map[string]string{
"summary": "VM {{ $labels.namespace }}/{{ $labels.name }} is based on RHEL6 template, and this will not be supported in the next release",
//"runbook_url": runbookURLBasePath + Rhel6AlertName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the runbook / runbook_url be created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The runbook will be created under kubevirt/monitoring project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be created in advance? Or only after the metric is available? (Sorry, I don't know much about runbooks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much neither :) In general, I think the two things can go in parallel. Personally, I think that documentation that says there is something, but in fact there is not, is worse than providing something without saying it. But I am open to other points of view. If you prefer to firstly update the documentation, no problem at all. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fossedihelm every alerts must have a runbook in advance. Changing this test just to get the alert in the code is not good practice. The test was set in place for a reason.
Please revert the change to the test and add a runbook and runbook_url annotation to the alert asap. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sradco kubevirt/monitoring#177 for the runbook.
ssp changes ready as soon the monitoring pr is merged. Thanks and sorry

@lyarwood
Copy link
Member

I'm not against this approach tbh, however I'd like to see more justification in the PR/ReleaseNote/Commits about why we need this (RHEL 6 is ELS downstream but what does that mean here?) and some docs somewhere in the tree about the controller before we merge this if possible.

RHEL6 guests will be not supported on RHEL9 hosts anymore [1].
And CNV 4.13 is planned to be on RHCOS 9, so RHEL6 cannot be supported in CNV 4.13 anymore.

Adding a vm controller that watch for rhel 6 vms.
Rhel6 template put a label in the vm with the format:
rhel6-<workload>-<flavor>.
This label is checked to determine whether a vm is rhel6 or not.

[1] https://access.redhat.com/articles/973163#rhelkvm

Signed-off-by: fossedihelm <[email protected]>
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.6% 0.6% Duplication

@fossedihelm
Copy link
Contributor Author

@lyarwood Thanks for the offline discussion.
Can you have a look? @0xFelix @akrejcir Thanks

@fossedihelm
Copy link
Contributor Author

fossedihelm commented Jan 26, 2023

/hold
Evaluating an alternative that can avoid the new vm controller.

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2023
@fossedihelm
Copy link
Contributor Author

/unhold
The alternative will create a tight coupling between CMO and CNV, which is not good

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2023
@fossedihelm
Copy link
Contributor Author

/retest

@0xFelix
Copy link
Member

0xFelix commented Jan 30, 2023

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2023
@fossedihelm
Copy link
Contributor Author

/retest

@danilo-gemoli
Copy link

/test e2e-single-node-functests

@kubevirt-bot
Copy link
Contributor

@danilo-gemoli: No presubmit jobs available for kubevirt/ssp-operator@master

In response to this:

/test e2e-single-node-functests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danilo-gemoli
Copy link

/retest

@fossedihelm
Copy link
Contributor Author

@akrejcir Can you have a look again, thank!

@akrejcir
Copy link
Collaborator

akrejcir commented Feb 1, 2023

/lgtm

@akrejcir
Copy link
Collaborator

akrejcir commented Feb 1, 2023

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrejcir

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2023
@kubevirt-bot kubevirt-bot merged commit f879ff6 into kubevirt:master Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants