Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise pod handling in machine updates #202

Merged
merged 5 commits into from
Jan 25, 2019

Conversation

prashanth26
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #120

Special notes for your reviewer:

Release note:

Prefers scheduling of pods on newer machines during roll-outs

@prashanth26 prashanth26 added kind/enhancement Enhancement, improvement, extension priority/critical Needs to be resolved soon, because it impacts users negatively reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) status/in-progress Issue is in progress/work needs/tests Needs (more) tests topology/seed Affects Seed clusters area/high-availability High availability related labels Dec 17, 2018
@prashanth26 prashanth26 requested review from ggaurav10 and a team as code owners December 17, 2018 07:31
@prashanth26 prashanth26 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Dec 17, 2018
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 17, 2018
@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 17, 2018
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 17, 2018
@ggaurav10
Copy link
Contributor

Shouldn't we handle the rollback scenario? Basically, remove the taints from the machines, in case the update is paused and rolled back.

@prashanth26
Copy link
Contributor Author

  • Shouldn't we handle the rollback scenario? Basically, remove the taints from the machines, in case the update is paused and rolled back.

I thought that it would be okay to ignore this as even if it rolls back, these nodes would only have prefer-no-schedule, which still means pods will be scheduled here if its the last option. Also, I haven't seen usages for rollbacks much. I do agree that ideally, this should be there, but I am not sure if it's required here. I have no hard reservations though.

@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 19, 2018
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 19, 2018
@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 2, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 2, 2019
@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/tests Needs (more) tests labels Jan 14, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 14, 2019
@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 14, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 14, 2019
@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 14, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 14, 2019
@hardikdr
Copy link
Member

hardikdr commented Jan 19, 2019

@prashanth26

I thought that it would be okay to ignore this as even if it rolls back, these nodes would only have prefer-no-schedule, which still means pods will be scheduled here if its the last option.

So what is the conclusion here then, are we really not handling the rollback?
Ideally, if we support a rollback mechanism, we should make sure it's not broken/convoluted due to any of the new feature. ;)
Btw PreferNoSchedule can affect rollback's even in a deeper sense, what if after rollback is complete few machines have PNS taint and rest does not have. It basically means few get preferred for no good reason while pod-scheduling. :/

PS: Seems like rollback scene had been taken care, just that didn't reflect here in conversation.

Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

halfway through the review.
It would be nice to brief the algorithm a little in PR-description or let's discuss it offline sometime in office.

pkg/controller/deployment_rollback.go Outdated Show resolved Hide resolved
pkg/controller/controller_utils.go Show resolved Hide resolved
// removeTaintNodesBackingMachineSets removes taints from all nodes backing the machineSets
func (dc *controller) removeTaintNodesBackingMachineSets(machineSet *v1alpha1.MachineSet) error {

if machineSet.Annotations[PreferNoSchedule] == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if, due to any reason, PNS is not there as a key in the map, just curious? Will that causes Runtime error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this issue. Refer this commit for all changes - 81cc655

pkg/controller/deployment_rollback.go Outdated Show resolved Hide resolved
@prashanth26
Copy link
Contributor Author

@prashanth26

I thought that it would be okay to ignore this as even if it rolls back, these nodes would only have prefer-no-schedule, which still means pods will be scheduled here if its the last option.

So what is the conclusion here then, are we really not handling the rollback?
Ideally, if we support a rollback mechanism, we should make sure it's not broken/convoluted due to any of the new feature. ;)
Btw PreferNoSchedule can affect rollback's even in a deeper sense, what if after rollback is complete few machines have PNS taint and rest does not have. It basically means few get preferred for no good reason while pod-scheduling. :/

PS: Seems like rollback scene had been taken care, just that didn't reflect here in conversation.

Yes, rollback has been taken care off.

1. Renamed PreferNoSchedule to PreferNoScheduleKey
2. Fixed map key check
@prashanth26 prashanth26 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 23, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 23, 2019
Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/high-availability High availability related kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) priority/critical Needs to be resolved soon, because it impacts users negatively size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) status/in-progress Issue is in progress/work topology/seed Affects Seed clusters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise Pod Handling in Machine Roll-Out
4 participants