-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force Drain and detachment for Volumes for Unhealthy Nodes which were NotReady
for over 5min
#781
Comments
Proposed solutions:
cc @dguendisch |
Proposals after groomingWe have decided to do some testing , where we would first delete the node object and then delete the VM or delete them both in parallel. This would mean A/D controller detaching volumes and MCM azure(for now) also doing the detachment. |
From issues#2556 mentioned by @himanshu-kun:
The reconciler https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/reconciler/reconciler.go#L232 works on https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/cache/actual_state_of_world.go#L156 that is set by https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/cache/actual_state_of_world.go#L388 which is called here https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/attach_detach_controller.go#L667, but that only works with a node resource. To me this sounds somewhat strange, if it really depends on the node that is gone, is it not? Is it keeping that state in memory, because otherwise it wouldn’t know anymore, if the source of truth is the (missing) node resource and it got restarted. I guess, what I am trying to ask: Was I reading that wrongly or are you saying, KCM does still wait 6m even if the node resource disappears? |
Deleting the node resource before the machine termination was confirmed (or doing both in parallel) is risky, isn't it? To me, it doesn't sound safe/worth exploring, because you can never know whether the With a web hook in place, we could "purge" the |
Hi Vedran, We discussed this and we agree with you. We propose the following enhancement: High Level Proposal
Detailed ProposalOverview of Current Deletion Flow
Proposed EnhancementWe enhance the
To ensure we can appropriately timeout in The end of {
"Description":"waitForNodeVolumesDetach",
"State": "Processing",
"Type": "Delete",
"LastUpdateTime": "1:00 PM"
"LastStateTransitionTime": "1:00 PM"
} |
NotReady
for over 5minNotReady
for over 5min
Another live issue # 3645 |
How to categorize this issue?
/area performance
/kind bug
/priority 2
What happened:
Currently we skip drain for an unhealthy node which we want to remove, if we see that it was not Ready for >5 min.
We have now seen problems with this. Imagine a node with a lot of PVs attached. Now node goes unhealthy and we skip drain, and start deleting the VM.
In azure as part of deletion we currently detach the volumes from the VM (PV disks and root disk) and then proceed with VM deletion.
This means MCM now will detach the backing disks of the PVs for the pods directly and then trigger node deletion. K8s starts acting only after node deletion as till now everything was happening on infra.
KCM deletes the nodes, and then the orphan pod force deletion logic removes the pods. This only makes the attach/detach controller come into action.
The attach/detach controller relies on kubelet to direct CSI driver on the node to unmount the disk. Since there is no node , meaning no kubelet , the A/D controller waits till
maxWaitForUnmountDuration
(6min , non-configurable) and then only force detachment starts.So there are two detachments , one by MCM and then by A/D controller. This leads to downtimes for the customers.
What you expected to happen:
Expect the pods with PV to recover faster after unhealthy nodes are removed
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
During normal draining a wait for detachment and wait for reattachment( wait for new volumeattachment formation in case volumeattachment support is enabled) is done , introduced in #608
live issue # 3645
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: