Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.26] disk: avoid concurrent operation on the same disk #1325

Open
wants to merge 1 commit into
base: v1.26-compatible
Choose a base branch
from

Conversation

huww98
Copy link
Contributor

@huww98 huww98 commented Feb 18, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

If the umount takes extremely long time and the NodeUnstageVolume RPC times out, kubelet may invoke NodeStageVolume when it retries and we may mistake an unmounting path as a good one. This lock only protects us in one process. But fortunately, when restarting CSI, Kubernetes will wait for the previous process to exit before starting the new one, even if it stucks in uninterruptible syscall (D state). So, we will not have multiple processes on the same node and the lock should be safe enough.

Note that there is still small chance that the previous attach/detach OpenAPI is invoked by the terminated CSI process, but the state returned by DescribeDisks is not updated yet. We should fix this by switch attach/detach to controller.

This is a backport commit. We backport this to workaround another edge case in QEMU/Linux. If we call DetachDisk while unmounting, the disk will stuck at Detaching state, then QEMU will issue multiple interrupts to retry detaching. After unmount finished, Linux kernel will eject the PCI slot multiple times in response to the interrupts. But that may eject the disk that was just inserted to the same slot. This commit will ensure we don't call DetachDisk while unmounting in UnstageVolume.

Note that we may still call DetachDisk in DeleteVolume, which may be triggered by old KCM if unmounting takes over 6min. And this is still possible to trigger the above edge case.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

backport of #1170 (partial)

Does this PR introduce a user-facing change?

Fix unintended disk detach or failed attach when the unmounting is extremely slow

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


If the umount takes extremely long time and the NodeUnstageVolume RPC times out, kubelet may invoke NodeStageVolume when it retries and we may mistake an unmounting path as a good one. This lock only protects us in one process. But fortunately, when restarting CSI, Kubernetes will wait for the previous process to exit before starting the new one, even if it stucks in uninterruptible syscall (D state). So, we will not have multiple processes on the same node and the lock should be safe enough.

Note that there is still small chance that the previous attach/detach OpenAPI is invoked by the terminated CSI process, but the state returned by DescribeDisks is not updated yet. We should fix this by switch attach/detach to controller.

This is a backport commit. We backport this to workaround another edge case in
QEMU/Linux. If we call DetachDisk while unmounting, the disk will stuck at
Detaching state, then QEMU will issue multiple interrupts to retry detaching.
After unmount finished, Linux kernel will eject the PCI slot multiple times in
response to the interrupts. But that may eject the disk that was just inserted
to the same slot.  This commit will ensure we don't call DetachDisk while
unmounting in UnstageVolume.

Note that we may still call DetachDisk in DeleteVolume, which may be triggered
by old KCM if unmounting takes over 6min. And this is still possible to trigger
the above edge case.
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huww98
Once this PR has been reviewed and has the lgtm label, please assign mowangdk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@huww98 huww98 changed the title disk: avoid concurrent operation on the same disk [v1.26] disk: avoid concurrent operation on the same disk Feb 18, 2025
@mowangdk
Copy link
Contributor

/lgtm

Please provide the customer with a testing version, I will merge it upon it has been verifed.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants