[v1.26] disk: avoid concurrent operation on the same disk #1325

huww98 · 2025-02-18T16:25:37Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

If the umount takes extremely long time and the NodeUnstageVolume RPC times out, kubelet may invoke NodeStageVolume when it retries and we may mistake an unmounting path as a good one. This lock only protects us in one process. But fortunately, when restarting CSI, Kubernetes will wait for the previous process to exit before starting the new one, even if it stucks in uninterruptible syscall (D state). So, we will not have multiple processes on the same node and the lock should be safe enough.

Note that there is still small chance that the previous attach/detach OpenAPI is invoked by the terminated CSI process, but the state returned by DescribeDisks is not updated yet. We should fix this by switch attach/detach to controller.

This is a backport commit. We backport this to workaround another edge case in QEMU/Linux. If we call DetachDisk while unmounting, the disk will stuck at Detaching state, then QEMU will issue multiple interrupts to retry detaching. After unmount finished, Linux kernel will eject the PCI slot multiple times in response to the interrupts. But that may eject the disk that was just inserted to the same slot. This commit will ensure we don't call DetachDisk while unmounting in UnstageVolume.

Note that we may still call DetachDisk in DeleteVolume, which may be triggered by old KCM if unmounting takes over 6min. And this is still possible to trigger the above edge case.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

backport of #1170 (partial)

Does this PR introduce a user-facing change?

Fix unintended disk detach or failed attach when the unmounting is extremely slow

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

If the umount takes extremely long time and the NodeUnstageVolume RPC times out, kubelet may invoke NodeStageVolume when it retries and we may mistake an unmounting path as a good one. This lock only protects us in one process. But fortunately, when restarting CSI, Kubernetes will wait for the previous process to exit before starting the new one, even if it stucks in uninterruptible syscall (D state). So, we will not have multiple processes on the same node and the lock should be safe enough. Note that there is still small chance that the previous attach/detach OpenAPI is invoked by the terminated CSI process, but the state returned by DescribeDisks is not updated yet. We should fix this by switch attach/detach to controller. This is a backport commit. We backport this to workaround another edge case in QEMU/Linux. If we call DetachDisk while unmounting, the disk will stuck at Detaching state, then QEMU will issue multiple interrupts to retry detaching. After unmount finished, Linux kernel will eject the PCI slot multiple times in response to the interrupts. But that may eject the disk that was just inserted to the same slot. This commit will ensure we don't call DetachDisk while unmounting in UnstageVolume. Note that we may still call DetachDisk in DeleteVolume, which may be triggered by old KCM if unmounting takes over 6min. And this is still possible to trigger the above edge case.

k8s-ci-robot · 2025-02-18T16:25:54Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huww98
Once this PR has been reviewed and has the lgtm label, please assign mowangdk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mowangdk · 2025-02-19T02:20:29Z

/lgtm

Please provide the customer with a testing version, I will merge it upon it has been verifed.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 18, 2025

k8s-ci-robot requested a review from mowangdk February 18, 2025 16:25

huww98 changed the title ~~disk: avoid concurrent operation on the same disk~~ [v1.26] disk: avoid concurrent operation on the same disk Feb 18, 2025

k8s-ci-robot assigned mowangdk Feb 19, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.26] disk: avoid concurrent operation on the same disk #1325

[v1.26] disk: avoid concurrent operation on the same disk #1325

huww98 commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

mowangdk commented Feb 19, 2025

[v1.26] disk: avoid concurrent operation on the same disk #1325

Are you sure you want to change the base?

[v1.26] disk: avoid concurrent operation on the same disk #1325

Conversation

huww98 commented Feb 18, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Feb 18, 2025

mowangdk commented Feb 19, 2025