-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zpool import crashes the single CPU machines. #6283
Comments
I did further investigation and see strange results. Sometimes it crashes in different places but always stack backtrace has common zvol_create_minor_impl ancestor. In reported crash the instruction pointer was at:
but the RDI contains correct address to struct kobject.
I had another crash dump where the instruction pointer was at:
so it should fail only if the RSP points to wrong address, but the address in RSP is correct:
The full backtrace in this case looks as follows:
The get_disk function calls try_module_get function and then returns to get_disk function again.
everything looks correct but machine crashed. Do you have any ideas? I'll do bisection what changes caused this issue and I'll try to reproduce this on different hypervisor. |
This problem occurs always when we use single CPU machine no matter if this is virtual or real environment. The system hangs since commit 0778358. |
What does dmesg say? |
@ab-oe thanks for isolating the exact commit. That makes it clear you're hitting the You could try a different kernel and see if the issue is still reproducible. You could also try the current master code with your existing kernel includes to potentially helpful improvements. Getting any additional 97f8d79 Fix zvol_state_t->zv_open_count race |
@tuxoko there is nothing in dmesg, system hangs just like someone cut off the clock source from CPU. I have never seen similar effect before (unless this was a H/W failure). I suspected the hypervisor but I reproduced this also on VMWare and physical single core machine. I can only get core dumps from KVM and it ends with RIP in different places mostly in kobject_get function. @behlendorf I'll try today with different kernels and if it solves the issue I'll track down the commit that solves it. I already tested version with commits mentioned by you. They didn't help in this case. |
Update: I tested the the latest master with longterm 3.10.107, 4.4.75 and 4.9.35 kernels all behaves the same. System hangs just after executing |
I added schedule before returning
The system now remains stable even with hunderds of zvols. With some debug code I noticed that on virtual machine with one zpool with one zvol there are ~1000 retries of |
Ahh, @ab-oe that makes sense and matches your symptoms. One cpu spinning is a pretty good way to cause a hang. On a single cpu system there's not much preventing us from potentially spinning here unless we explicitly reschedule when the |
@behlendorf thank you. I created PR #6312. I moved schedule outside the |
System information
Describe the problem you're observing
The system hangs just after zpool import pool_name is executed.
Describe how to reproduce the problem
Create zpool with at least one zvol.
Export the zpool.
Import the zpool.
Currently I can reproduce this issue only on KVM virtual machines. It is not reproducible with 0.6.5 version.
Include any warning/errors/backtraces from the system logs
Logs from crash:
I'll look at this and provide more details soon.
The text was updated successfully, but these errors were encountered: