-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cant delete PV #266
Comments
@philipp1992 you are hitting this issue https://vsphere-csi-driver.sigs.k8s.io/known_issues.html#issue_5
|
The issue is fixed in the external-provisioner. If this is a test environment, can you upgrade provisioner image to |
@divyenpatel I think I hit the same thing when testing the csi driver v2.0.0 with csi-provisioner v1.6.0. But I don't fully understand the fix. Looking deeper into the logs, it is true that When I examined vsphere-csi-controller logs, it seems like detach fails with this error:
|
@misterikkit this is because of the backend vSphere API bug. Here in this issue when Delete API is called, it is untagging container volume first and then attempts to delete the volume, but since volume is already attached to VM, delete fails. API does not tag back volume is container volume. During detach, we first query vCenter to determine if Volume is Block or File and if it is file volume we skip detach. Since volume is already untagged as container volume, vSphere does not return this volume, so ControllerUnpublishVolume fails with volume not found. This issue is fixed in upcoming release vSphere 7.0u1 For prior vSphere releases, we recommend customers to use the latest version of |
I'm running with TKG 1.1.2 on 6.7U3, is there any fix for the 1.x version of the CSI driver? We are running csi-provisioner version 1.4.0 at the moment (whatever is default for TKG). |
The reason that csi-provisioner was moved to 2.0 is that RBAC changes are
required for the fix to work. For that reason, I doubt that the fix will be
backported.
… |
It seems like a pretty fatal flaw in the 6.7 implementation if it randomly fails to delete a PV.
We are also seeing a recurring attempt to delete something every 5 minutes in the vCenter logs, presumably related to this same issue.
If just basic creation and deletion of a PV don’t work properly it doesn’t seem like they should even enable it for 6.7U3.
… On Aug 19, 2020, at 5:09 PM, Jonathan Basseri ***@***.***> wrote:
The reason that csi-provisioner was moved to 2.0 is that RBAC changes are
required for the fix to work. For that reason, I doubt that the fix will be
backported.
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@longwa this situation only happens when there's a race between trying to delete a PVC and Pod (for example when deleting a namespace). Can you run the steps @divyenpatel has mentioned in #266 (comment) and see if that helps?
Yes, kubernetes attempts to delete the backend volume in a loop until it succeeds. |
@RaunakShah So the workaround steps are how to delete a Pod and not have the volume get in this state? I don't see any VolumeAttachments for the failed PV in this case so I'm assuming the workaround isn't useful after you have already gotten the error? Also, what is causing the retry and how can I stop it? It will never succeed from what I can tell and is spamming our Event logs with failures every 5 minutes. I'll have to see about using govc to delete the FCD. I don't have any visibility to the IVD volumes in the CNS UI or anywhere else so I'm really pretty blind about what's going on here. |
@longwa Kubernetes design is to retry operations it sees as temporary failures. The PV object is still present with a
|
/assign @divyenpatel |
we still hit this issue. csi-provisioner is 2.2.0 version which should have the fix already. I am not sure any other logic might cause container volume untagged too? The issue only happen on 6.7u3. I think as @divyenpatel mentioned vCenter 7.0.1 fixed this issue. But wondering since csi-provisioner checked volume is still attached, and should not issue delete (verified log about this). What caused this happen? |
/reopen |
@jingxu97: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi
using vanilla kubernetes 1.18 with vsphere 7.0 and csi 2.0
Everything works fine, but when i delete a PVC or PV, the following happens:
kubectl describe pv
Warning VolumeFailedDelete 85s csi.vsphere.vmware.com_vsphere-csi-controller-5c4d7b6ffc-sxtxv_90ec2c56-c50c-4acf-9c03-b506726b5800 rpc error: code = Internal desc = failed to delete volume: "9c310d6c-dea0-488c-a3bd-c91d86fc00c2". Error: failed to delete volume: "9c310d6c-dea0-488c-a3bd-c91d86fc00c2", fault: "(*types.LocalizedMethodFault)(0xc00047ebe0)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) ,\n Reason: (string) (len=63) "CNS: Failed to delete disk:Fault cause: vim.fault.InvalidState\n"\n },\n LocalizedMessage: (string) (len=79) "CnsFault error: CNS: Failed to delete disk:Fault cause: vim.fault.InvalidState\n"\n})\n", opID: "7ae7c7f7"
The text was updated successfully, but these errors were encountered: