-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't track reattachment of provider-unrelated PVs #937
Conversation
@timebertt Thank you for your contribution. |
Thank you @timebertt for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below. |
We probably need to add tests for this PR. However, we wanted to open it to get some feedback on whether this is the correct approach/place to solve the issue. |
@timebertt do you think, we should generalise this to also skip the check in case the CSIDriver is configured with |
Sounds good. I had the impression that the attachment tracking doesn't work at all if there are no Another PR could then enhance the check for CSI drivers where no |
@kon-angelo @gardener/mcm-maintainers can you provide feedback on the PR and the above discussion? |
I agree to @kon-angelo's observation that this can be generically handled. @timebertt since you have raised this PR we can also do this in 2 steps. We can skip NFS volumes (as you have done in this PR) and then later we can generalise this. Since generalisation would required additional lookup of the CSIDriver object and looking up its |
Aside from my previous point, I kind of understand @timebertt 's suggestion. The MCM code relies on volumeattachments which also to my knowledge is only created for CSI volumes. We could probably update the logic in this PR to skip anything other than CSI volumes. (but my experience is just going through the code and this can be verified from one of the actual mcm maintainers). Though one point is that I am not particularly fond that we "skip" reporting the existence of volumes, rather than not accounting them when draining - if it makes sense. Like this can be somewhat troubling to debug I find. But the actual function doing the work I just feel that at some point we will go down a debugging rabbit hole because |
Back from vacation, sorry for the delay.
Got your point. I will rework the PR to skip any non-CSI volumes instead of handling NFS as a special case, improve the code flow (where the filtering is done), and add unit tests. Another PR can be opened later to consult the |
While looking into the filtering again, I noticed a misunderstanding on my side. The actual problem is that
See machine-controller-manager/pkg/util/provider/drain/drain.go Lines 510 to 542 in 97cd752
So when doing the filtering in The problem is that the To solve the problem, I reworked the PR so that there is only a single volume list in The PR is still missing unit tests. |
I reproduced the issue described above on an OpenStack shoot using the following steps:
Click me to expand
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: null
labels:
app: test
name: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
creationTimestamp: null
labels:
app: test
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: test
mountPath: /data
ports:
- containerPort: 80
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: test
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 128Mi
storageClassName: local-path
The logs show that The 90s for this unnecessary attachment tracking sums up quickly when rolling node pools having many provider-unrelated PVs. |
9031fe6
to
03d4fed
Compare
Now, I verified my fix with a manually built image of mcm-provider-openstack and the steps described above. With the change, both lists for detach and reattach tracking are empty and the drain operation finishes quickly without considering the non-cinder volume (as desired):
I also tested the change with a node that had 3 cinder volumes attached:
Now, that the fix is verified and I'm confident in the change of this PR, I only need to add unit tests. While verifying my change, I discovered another bug (unrelated to this change, but it breaks the reattach tracking): #945 |
Hey @gardener/mcm-maintainers @kon-angelo. I finished verifying this PR and adding tests. |
break | ||
} | ||
|
||
response, err := o.Driver.GetVolumeIDs(ctx, &driver.GetVolumeIDsRequest{PVSpecs: []*corev1.PersistentVolumeSpec{pvSpec}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: The new implementation can result in excessive calls to Driver.GetVolumeIDs
and is highly-dependent on the provider implementation making this cheap. Other can result in rate limit errors at provider. Earlier there was only a single call to Driver.GetVolumeIDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is true. I knowingly changed this, because existing implementation that I looked into (aws and openstack) didn't call an API on GetVolumeIDs
.
I was under the impression that drivers are expected to extract the volume IDs purely from the PV spec without the help of an external API.
If this is not the case, I can restructure the code again to perform a single GetVolumeIDs
call and use the list of IDs from that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elankath can you comment on how the PR should handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the godoc for GetVolumeIDs
does mention this (though we should fix the grammar). It is fine - you don't need to restructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not the case, I can restructure the code again to perform a single
GetVolumeIDs
call and use the list of IDs from that instead.
While thinking about it again, this won't be possible.
The reason is, that the driver filters the list of input volumes depending on whether it recognizes the volume. However, we won't be able to correlate the input volumes and filtered output volumes, which would be required to achieve the needed filtering in the drain logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine - you don't need to restructure.
Ok, great. Then, let's go for merging this PR :)
@timebertt sorry for dumb question, but was there any additional code introduced for NFS filtering ? Since you raised that as as point, but we can't find the same. I am assuming that the standard filtering that is currently implemented in the provider implemention for |
Yes, the filtering in |
@elankath can you take another look please? |
@timebertt Just to confirm: was the force-drain logic also tested ? One can test with by using Thanks! |
I just tested the force deletion of a machine that had three cinder volumes and one local-path volume (see steps to reproduce: #937 (comment)).
I looked into the code and found that the deletion of |
Yes, exactly. |
Thanks for the clarification and test Tim. (The test is not related to your PR - just a basic sanity check just to check that the force-deletion flow was not impacted by drain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm. Thanks for the contribution.
I think #937 (comment) is still open. Can you take a look at my thoughts? |
No, I believe there is no need to restructure. Ideally we should have have had a better response instead of just a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I'll run the IT once with provider AWS and then merge this PR.
tests passed on AWS. I'll fix #945. Thanks for raising this issue. |
/kind bug
What this PR does / why we need it:
When draining nodes, MCM already ignores provider-unrelated PVs when waiting for the detachments.
However, it does wait for such PVs to reattach to other nodes until running into a timeout.
This PR ensures that the same filter of the driver is used for tracking reattachments of PVs as well.
For example, when draining nodes where NFS volumes are used, MCM would wait for the reattachment of these volumes.
There is no such thing as an attachment to a node for NFS volumes. NFS volumes only need to be mounted (by kubelet).
Accordingly, MCM always runs into the configured PV reattach timeout, if there are pods with NFS PVCs on the to-be-drained node.
This prolongs the drain operation of such nodes severely.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
cc @xoxys
Release note: