Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable VMI migration upon K8s unschedulable taint #823

Conversation

danielBelenky
Copy link
Contributor

@danielBelenky danielBelenky commented Sep 22, 2020

Currently, the default behaviour is to migrate the VMI when a node is
tainted with the node.kubernetes.io/unschedulable key. This behaviour is
wrong and can be un-expected from the user's perspective because the
unschedulable taint means "do not schedule new workloads on this node"
and not "evict existing workloads from this node".

kubevirt/kubevirt#4012 adds support for the
eviction API so we can properly handle node drains and specific
evictions on VMI pods.

This patch avoid setting (and explicitly removes during upgrades if
set in the past) the default node drain key so we won't migrate VMIs
when we're not expected to do so.

Fixes: https://bugzilla.redhat.com/1881676

Signed-off-by: Daniel Belenky [email protected]

Release note:

VMIs will no longer migrate when node is tainted with node.kubernetes.io/unschedulable by default. Users can now use the proper node drain API to evacuate multiple VMIs from a node.

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 22, 2020
@danielBelenky
Copy link
Contributor Author

/hold

Waiting for kubevirt/kubevirt#4012

@kubevirt-bot kubevirt-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS labels Sep 22, 2020
@ovirt-infra
Copy link

All tests passed

Copy link
Member

@tiraboschi tiraboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please notice that virtconfig.MigrationsConfigKey is not reconciled so this is enough for new deployment but not for upgrades

@openshift-ci-robot
Copy link
Collaborator

@danielBelenky: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/hco-e2e-image-index-gcp 6d3459be2e239e9e5a334b9dd182d93c3857a233 link /test hco-e2e-image-index-gcp
ci/prow/hco-e2e-image-index-azure 6d3459be2e239e9e5a334b9dd182d93c3857a233 link /test hco-e2e-image-index-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Sep 22, 2020

hco-e2e-image-index-aws lane succeeded.
/override ci/prow/hco-e2e-image-index-azure
hco-e2e-image-index-aws lane succeeded.
/override ci/prow/hco-e2e-image-index-gcp

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-image-index-azure, ci/prow/hco-e2e-image-index-gcp

In response to this:

hco-e2e-image-index-aws lane succeeded.
/override ci/prow/hco-e2e-image-index-azure
hco-e2e-image-index-aws lane succeeded.
/override ci/prow/hco-e2e-image-index-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link
Contributor

@danielBelenky: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-hco-node-placement-k8s-1.17 6d3459be2e239e9e5a334b9dd182d93c3857a233 link /test pull-hco-node-placement-k8s-1.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@stu-gott stu-gott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2020
@stu-gott
Copy link
Member

/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 23, 2020
@tiraboschi tiraboschi force-pushed the disable-migration-when-unschedulable branch from 6d3459b to 9a3fc2b Compare October 27, 2020 15:40
@kubevirt-bot kubevirt-bot added size/M and removed lgtm Indicates that a PR is ready to be merged. size/XS labels Oct 27, 2020
@ovirt-infra
Copy link

All tests passed

@tiraboschi
Copy link
Member

/cherry-pick release-1.2

@kubevirt-bot
Copy link
Contributor

@tiraboschi: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Currently, the default behaviour is to migrate the VMI when a node is
tainted with the node.kubernetes.io/unschedulable key. This behaviour is
wrong and can be un-expected from the user's perspective because the
unschedulable taint means "do not schedule new workloads on this node"
and not "evict existing workloads from this node".

kubevirt/kubevirt#4012 adds support for the
eviction API so we can properly handle node drains and specific
evictions on VMI pods.

This patch avoid setting (and explicitly removes during upgrades if
set in the past) the default node drain key so we won't migrate VMIs
when we're not expected to do so.

Fixes: https://bugzilla.redhat.com/1881676

Signed-off-by: Daniel Belenky <[email protected]>
Signed-off-by: Simone Tiraboschi <[email protected]>
@tiraboschi tiraboschi force-pushed the disable-migration-when-unschedulable branch from 9a3fc2b to 3c05b7f Compare October 28, 2020 09:27
@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2020
Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nunnatsa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2020
@openshift-merge-robot
Copy link
Collaborator

@danielBelenky: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/hco-e2e-upgrade-prev-aws 3c05b7f link /test hco-e2e-upgrade-prev-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Oct 28, 2020

hco-e2e-upgrade-prev-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-aws

In response to this:

hco-e2e-upgrade-prev-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot kubevirt-bot merged commit 87fbc56 into kubevirt:master Oct 28, 2020
@kubevirt-bot
Copy link
Contributor

@tiraboschi: #823 failed to apply on top of branch "release-1.2":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	pkg/controller/hyperconverged/hyperconverged_controller.go
M	pkg/controller/hyperconverged/hyperconverged_controller_components_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/hyperconverged/hyperconverged_controller_components_test.go
Auto-merging pkg/controller/hyperconverged/hyperconverged_controller.go
CONFLICT (content): Merge conflict in pkg/controller/hyperconverged/hyperconverged_controller.go
Patch failed at 0001 Disable VMI migration upon K8s unschedulable taint

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot pushed a commit that referenced this pull request Oct 28, 2020
Currently, the default behaviour is to migrate the VMI when a node is
tainted with the node.kubernetes.io/unschedulable key. This behaviour is
wrong and can be un-expected from the user's perspective because the
unschedulable taint means "do not schedule new workloads on this node"
and not "evict existing workloads from this node".

kubevirt/kubevirt#4012 adds support for the
eviction API so we can properly handle node drains and specific
evictions on VMI pods.

This patch avoid setting (and explicitly removes during upgrades if
set in the past) the default node drain key so we won't migrate VMIs
when we're not expected to do so.

Fixes: https://bugzilla.redhat.com/1881676

Signed-off-by: Daniel Belenky <[email protected]>
Signed-off-by: Simone Tiraboschi <[email protected]>

Co-authored-by: Daniel Belenky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants