-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PV stays hanging in released state #217
Comments
I am having a similar issue moving from version 1.0.0 of the CSI driver to version 2.0.0. I can create PVs, but cannot delete them the majority of the time (about 20% of the time it works). They stay in the release state. Logs: csi-attacher: I0506 12:16:26.400501 1 controller.go:175] Started VA processing "csi-7d8f5cbf2620398933db4179f14efa4bdbcd923ee15a1f41aae0e0f34bacc96e" csi-controller: {"level":"error","time":"2020-05-06T12:16:32.724480563Z","caller":"common/vsphereutil.go:351","msg":"failed to delete disk 276ae09e-96a0-4236-a053-7dbea3997318 with error failed to delete volume: "276ae09e-96a0-4236-a053-7dbea3997318", fault: "(*types.LocalizedMethodFault)(0xc000614a80)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) ,\n Reason: (string) (len=63) \"CNS: Failed to delete disk:Fault cause: vim.fault.InvalidState\\n\"\n },\n LocalizedMessage: (string) (len=79) \"CnsFault error: CNS: Failed to delete disk:Fault cause: vim.fault.InvalidState\\n\"\n})\n", opID: "0655df75"","TraceId":"e559b2ef-d09f-4e42-a0da-075412f4233d","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/common.DeleteVolumeUtil\n\t/build/pkg/csi/service/common/vsphereutil.go:351\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).DeleteVolume\n\t/build/pkg/csi/service/vanilla/controller.go:449\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_DeleteVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5164\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).deleteVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:183\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:92\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218 {"level":"error","time":"2020-05-06T12:16:32.724652761Z","caller":"vanilla/controller.go:452","msg":"failed to delete volume: "276ae09e-96a0-4236-a053-7dbea3997318". Error: failed to delete volume: "276ae09e-96a0-4236-a053-7dbea3997318", fault: "(*types.LocalizedMethodFault)(0xc000614a80)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) ,\n Reason: (string) (len=63) \"CNS: Failed to delete disk:Fault cause: vim.fault.InvalidState\\n\"\n },\n LocalizedMessage: (string) (len=79) \"CnsFault error: CNS: Failed to delete disk:Fault cause: vim.fault.InvalidState\\n\"\n})\n", opID: "0655df75"","TraceId":"e559b2ef-d09f-4e42-a0da-075412f4233d","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).DeleteVolume\n\t/build/pkg/csi/service/vanilla/controller.go:452\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_DeleteVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5164\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).deleteVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:183\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:92\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/gocsi@v1 In vCenter I get these two events repeating after I try to delete the volume: Delete container volume (Completed) (even with version 1.0.2 of the driver, I sometimes get the above message, but the PV is eventually released and datastore is cleaned up) To rule out permissions issue, I tried using credentials with global admin, but same error occurs. Upon reverting back to version 1.0.0 or 1.0.2 of the driver (with the proper restrictive permissions), I can add/remove volumes normally with consistency. Environment: |
I can confirm that we are experiencing the same issue. yaml to reproduce: apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vsphere-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
apiVersion: v1
kind: Pod
metadata:
name: pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: vsphere-pvc
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage kubectl delete pvc vsphere-pvc
kubectl delete pod pod Environment csi deployment images: node daemonset images: |
Are you hitting this issue 5 mentioned in the documentation? |
It sounds like that. But why does it work with the CSI driver 1.0.2? |
I observed the following behaviour in vCenter. |
Our cluster is also affected w/ |
You are observing in This issue is fixed in vSphere 7.0u1. @RaunakShah is also helping to mitigate this issue by providing the fix for kubernetes/kubernetes#84226 in the external provisioner. |
Is it possible for the driver/vsphere to check if volume is attached and fail deletion? This is how other cloud providers behave. |
Same issue here with v2.0.0... not funny to detach from 1 of 20 nodes and delete volumes in fcd manually |
@divyenpatel "This issue is fixed in vSphere 7.0u1" |
Yes it is not released yet. but @RaunakShah has already fixed the race by making a change in the external-provisioner - kubernetes-csi/external-provisioner#438 |
@AcidAngel21 The fix from external-provisioner is expected to be part of the next release - https://github.com/kubernetes-csi/external-provisioner/commits/v2.0.0-rc2 |
@RaunakShah Can we use v2.0.0-rc2 to get rid of above issue ? |
Will this fix be available in 6.7U3 with the 1.0.x version of the driver? We have no plans to upgrade to 7.0 in the near future and not being able to delete PV's will be a problem. |
I am on vSphere 7.0 and was able to test |
@RaunakShah csi-provisioner already released new version of image (v2.0.1) |
@xander-sh we've validated the latest versions of sidecars and updated the YAMLs in the |
We are using the version of CSI that installs by default with TKG on 6.7u3. I'm not sure if we can upgrade for this platform so I believe we are stuck with the bug. Hopefully, TKG 1.2 will come out soon and upgrade to the 2.x CSI driver for the 6.7u3 platform, but I'm not holding my breath on that one. |
Thanks, we are really looking forward to a fix csi-provisioner in the version 6.7u3 of vSphere. |
Hi, is there an update about the fix to version 6.7u3 of vSphere? |
vSphere CSI v2.0.1 release is now available - https://github.com/kubernetes-sigs/vsphere-csi-driver/releases/tag/v2.0.1 You will find updated manifests for vSphere 6.7u3 and 7.0 over here - https://github.com/kubernetes-sigs/vsphere-csi-driver/tree/master/manifests/v2.0.1 |
/close |
@RaunakShah: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I deploy a stateful set with 3 replicas and 3 PVCs (via storageclass). When I delete the statefulset and immediately delete the PVCs, most of the PVs stays hanging in status Released.
When I wait a some seconds before I delete the PVCs, this problem does not occur.
This problem does also not happen with csi-driver 1.0.2.
In vCenter I constantly see the error "The operation is not allowed in the current state". It seems that the driver tries to delete the storage object before it has been detached from the node.
A workaound to remove the hanging PVs is to remove the PV finalizers: kubectl patch pv pvc-*** -p '{"metadata":{"finalizers":null}}'
What you expected to happen:
PVs do not hang in the Released status and are removed.
How to reproduce it (as minimally and precisely as possible):
Deploy a stateful set with 3 replicas and 3 PVCs (via storageclass). Delete the statefulset and immediately delete the PVCs.
Anything else we need to know?:
csi-attacher logs
csi-controller logs
vshpere-syncer logs
csi-provisioner logs
Environment:
uname -a
): 4.14.85-rancherThe text was updated successfully, but these errors were encountered: