Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many machineDeployment generations #220

Closed
prashanth26 opened this issue Feb 7, 2019 · 9 comments · Fixed by #223 or #228
Closed

Too many machineDeployment generations #220

prashanth26 opened this issue Feb 7, 2019 · 9 comments · Fixed by #223 or #228
Assignees
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related component/gardener Gardener component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) effort/2d Effort for issue is around 2 days kind/bug Bug platform/all priority/2 Priority (lower number equals higher priority) topology/seed Affects Seed clusters
Milestone

Comments

@prashanth26
Copy link
Contributor

Issue

The machine deployment generations are constantly increasing with time. This causes issues with Gardener reconciliation.

Solution

Need to fix too many machine deployment updates

cc: @DockToFuture @adracus

@prashanth26 prashanth26 added kind/bug Bug priority/critical Needs to be resolved soon, because it impacts users negatively component/machine-controller-manager platform/all area/quality Output qualification (tests, checks, scans, automation in general, etc.) related size/s Size of pull request is small (see gardener-robot robot/bots/size.py) topology/seed Affects Seed clusters status/under-investigation Issue is under investigation labels Feb 7, 2019
@prashanth26 prashanth26 self-assigned this Feb 7, 2019
@prashanth26
Copy link
Contributor Author

These changes affect on shoots running as seeds negatively for seeds running Kubernetes version 1.13.x +.

Kindly refrain from migrating to seeds to 1.13.x until this issue is resolved.

The bug could be traced back to - kubernetes/kubernetes#69059

@adracus
Copy link

adracus commented Feb 13, 2019

cc @rfranzke

@prashanth26
Copy link
Contributor Author

  • From my intial set of tests, the migration from /status field to /status subresource looks like it's not a direct migration.

  • And if it's not a direct migration, we might do this migration along with the out of tree migration

  • However, we still need to find a solution for seeds moving to seed 1.13 well before that if the above is true.

@rfranzke
Copy link
Member

From my intial set of tests, the migration from /status field to /status subresource looks like it's not a direct migration.

Can you elaborate on this?

@prashanth26
Copy link
Contributor Author

prashanth26 commented Feb 15, 2019

When a machine-deployment was created with the old status (with status as a field), and when an MCM which understands new status (subresource), I can see that updateStatus() seems to return an error saying machine/status (sub)resource not found.

My guess probably is because etcd has initailly stored the entire machine object as a single resource, and now suddenly it is split into two resources and the etcd doesn't know about the new status (sub)resource attached to the machine. I still have to verify this by looking to the Kubernetes docs to see why this change is not backward compatible.

I tried this experiment by migrating from an old MCM to new MCM on a Gardener setup and MCM started throwing these errors.

@rfranzke
Copy link
Member

My guess probably is because etcd has initailly stored the entire machine object as a single resource, and now suddenly it is split into two resources and the etcd doesn't know about the new status (sub)resource attached to the machine.

No, /status subresource is only another REST API path. The object will not be split into two.

When a machine-deployment was created with the old status (with status as a field), and when an MCM which understands new status (subresource), I can see that updateStatus() seems to return an error saying machine/status (sub)resource not found.

Did you update the CRD and defining that you want to have the status subresource before starting new MCM?

@prashanth26
Copy link
Contributor Author

Did you update the CRD and defining that you want to have the status subresource before starting new MCM?

No, I wasn't aware of it. Probably that is the reason then. Let me quickly try that out.

@rfranzke
Copy link
Member

@prashanth26
Copy link
Contributor Author

prashanth26 commented Feb 15, 2019

This looks like it's working. Let me double check once more to be sure.

Thanks for the inputs @rfranzke 👍

@prashanth26 prashanth26 added this to the v0.14.0 milestone Feb 26, 2019
@ghost ghost added the component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) label Mar 7, 2020
@gardener-robot gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021
@gardener-robot gardener-robot added effort/2d Effort for issue is around 2 days and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related component/gardener Gardener component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) effort/2d Effort for issue is around 2 days kind/bug Bug platform/all priority/2 Priority (lower number equals higher priority) topology/seed Affects Seed clusters
Projects
None yet
5 participants