Too many machineDeployment generations #220

prashanth26 · 2019-02-07T12:03:17Z

Issue

The machine deployment generations are constantly increasing with time. This causes issues with Gardener reconciliation.

Solution

Need to fix too many machine deployment updates

cc: @DockToFuture @adracus

prashanth26 · 2019-02-13T12:10:11Z

These changes affect on shoots running as seeds negatively for seeds running Kubernetes version 1.13.x +.

Kindly refrain from migrating to seeds to 1.13.x until this issue is resolved.

The bug could be traced back to - kubernetes/kubernetes#69059

adracus · 2019-02-13T14:35:57Z

cc @rfranzke

prashanth26 · 2019-02-15T07:34:41Z

From my intial set of tests, the migration from /status field to /status subresource looks like it's not a direct migration.
And if it's not a direct migration, we might do this migration along with the out of tree migration
However, we still need to find a solution for seeds moving to seed 1.13 well before that if the above is true.

rfranzke · 2019-02-15T07:37:06Z

From my intial set of tests, the migration from /status field to /status subresource looks like it's not a direct migration.

Can you elaborate on this?

prashanth26 · 2019-02-15T07:49:16Z

When a machine-deployment was created with the old status (with status as a field), and when an MCM which understands new status (subresource), I can see that updateStatus() seems to return an error saying machine/status (sub)resource not found.

My guess probably is because etcd has initailly stored the entire machine object as a single resource, and now suddenly it is split into two resources and the etcd doesn't know about the new status (sub)resource attached to the machine. I still have to verify this by looking to the Kubernetes docs to see why this change is not backward compatible.

I tried this experiment by migrating from an old MCM to new MCM on a Gardener setup and MCM started throwing these errors.

rfranzke · 2019-02-15T07:52:06Z

My guess probably is because etcd has initailly stored the entire machine object as a single resource, and now suddenly it is split into two resources and the etcd doesn't know about the new status (sub)resource attached to the machine.

No, /status subresource is only another REST API path. The object will not be split into two.

When a machine-deployment was created with the old status (with status as a field), and when an MCM which understands new status (subresource), I can see that updateStatus() seems to return an error saying machine/status (sub)resource not found.

Did you update the CRD and defining that you want to have the status subresource before starting new MCM?

prashanth26 · 2019-02-15T07:55:42Z

Did you update the CRD and defining that you want to have the status subresource before starting new MCM?

No, I wasn't aware of it. Probably that is the reason then. Let me quickly try that out.

rfranzke · 2019-02-15T08:00:06Z

https://github.com/gardener/gardener/blob/master/charts/seed-bootstrap/templates/extensions/crd-operatingsystemconfig.yaml#L31-L32

prashanth26 · 2019-02-15T08:30:31Z

This looks like it's working. Let me double check once more to be sure.

Thanks for the inputs @rfranzke 👍

prashanth26 self-assigned this Feb 7, 2019

prashanth26 added the component/gardener Gardener label Feb 13, 2019

prashanth26 mentioned this issue Feb 14, 2019

Machine CRDs now makes use of status subresource #223

Merged

hardikdr closed this as completed in #223 Feb 14, 2019

gardener-robot-ci-1 removed the status/under-investigation Issue is under investigation label Feb 15, 2019

prashanth26 reopened this Feb 15, 2019

prashanth26 mentioned this issue Feb 15, 2019

Revert "Machine CRDs now makes use of status subresource" #226

Merged

prashanth26 mentioned this issue Feb 20, 2019

Updated machine objects to have status sub-resource #228

Merged

prashanth26 added this to the v0.14.0 milestone Feb 26, 2019

hardikdr closed this as completed in #228 Feb 26, 2019

ghost added the component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) label Mar 7, 2020

gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021

gardener-robot added effort/2d Effort for issue is around 2 days and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many machineDeployment generations #220

Too many machineDeployment generations #220

prashanth26 commented Feb 7, 2019

prashanth26 commented Feb 13, 2019

adracus commented Feb 13, 2019

prashanth26 commented Feb 15, 2019

rfranzke commented Feb 15, 2019

prashanth26 commented Feb 15, 2019 •

edited

Loading

rfranzke commented Feb 15, 2019

prashanth26 commented Feb 15, 2019

rfranzke commented Feb 15, 2019

prashanth26 commented Feb 15, 2019 •

edited

Loading

Too many machineDeployment generations #220

Too many machineDeployment generations #220

Comments

prashanth26 commented Feb 7, 2019

Issue

Solution

prashanth26 commented Feb 13, 2019

adracus commented Feb 13, 2019

prashanth26 commented Feb 15, 2019

rfranzke commented Feb 15, 2019

prashanth26 commented Feb 15, 2019 • edited Loading

rfranzke commented Feb 15, 2019

prashanth26 commented Feb 15, 2019

rfranzke commented Feb 15, 2019

prashanth26 commented Feb 15, 2019 • edited Loading

prashanth26 commented Feb 15, 2019 •

edited

Loading

prashanth26 commented Feb 15, 2019 •

edited

Loading