[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

TrafalgarZZZ · 2023-03-09T08:32:02Z

What is your environment(Kubernetes version, Fluid version, etc.)

Describe the bug
FUSE Recovery failed when using AlluxioRuntime.

What you expect to happen:
Alluxio FUSE should be successfully recovered after deleting the FUSE pod.

How to reproduce it

Simply run the e2e script in #2477 can reproduce this bug.

Additional Information

TrafalgarZZZ · 2023-03-09T08:38:51Z

The bug is probably caused by an incorrect order among Pod Readiness, FUSE mount point readiness and CSI Plugin's recover() func logic.

Currently, CSI plugin recovers broken mount points only if it detects any FUSE container restarts and become ready. However, it is possible that the execution order goes like:

Alluxio FUSE Pod restarts
Alluxio FUSE Pod ready
CSI detects container restart and recovers broken mount points (At this time, Alluxio FUSE mount point is not ready, so nothing happened)
Alluxio FUSE mount point ready
CSI Plugin won't retry Step 1 to 4 because it detects no container restarts

cheyang · 2023-03-09T13:12:29Z

The bug is probably caused by an incorrect order among Pod Readiness, FUSE mount point readiness and CSI Plugin's recover() func logic.

Currently, CSI plugin recovers broken mount points only if it detects any FUSE container restarts and become ready. However, it is possible that the execution order goes like:

Alluxio FUSE Pod restarts

Alluxio FUSE Pod ready

CSI detects container restart and recovers broken mount points (At this time, Alluxio FUSE mount point is not ready, so nothing happened)

Alluxio FUSE mount point ready

CSI Plugin won't retry Step 1 to 4 because it detects no container restarts

Can scanning /proc/self/mountinfo solve this issue?

TrafalgarZZZ added the bug Something isn't working label Mar 9, 2023

TrafalgarZZZ mentioned this issue Mar 15, 2023

Fix CSI FUSE Recovery #2747

Merged

cheyang closed this as completed in #2747 Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

TrafalgarZZZ commented Mar 9, 2023

TrafalgarZZZ commented Mar 9, 2023

cheyang commented Mar 9, 2023

[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

Comments

TrafalgarZZZ commented Mar 9, 2023

TrafalgarZZZ commented Mar 9, 2023

cheyang commented Mar 9, 2023