Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zpool import crashes the single CPU machines. #6283

Closed
ab-oe opened this issue Jun 29, 2017 · 9 comments
Closed

Zpool import crashes the single CPU machines. #6283

ab-oe opened this issue Jun 29, 2017 · 9 comments
Labels
Component: ZVOL ZFS Volumes

Comments

@ab-oe
Copy link
Contributor

ab-oe commented Jun 29, 2017

System information

Type Version/Name
Distribution Name all
Distribution Version N/A
Linux Kernel 4.4.45
Architecture x86_64
ZFS Version 0.7.0-rc4 (master at 5b7bb98)
SPL Version 0.7.0-rc4 (master at 7a35f2b)

Describe the problem you're observing

The system hangs just after zpool import pool_name is executed.

Describe how to reproduce the problem

Create zpool with at least one zvol.
Export the zpool.
Import the zpool.

Currently I can reproduce this issue only on KVM virtual machines. It is not reproducible with 0.6.5 version.

Include any warning/errors/backtraces from the system logs

Logs from crash:

      KERNEL: vmlinux                           
    DUMPFILE: vmm3.core
        CPUS: 1
        DATE: Thu Jun 29 11:51:32 2017
      UPTIME: 00:11:59
LOAD AVERAGE: 21.58, 5.45, 1.94
       TASKS: 600
    NODENAME: node-8a9aa200
     RELEASE: 4.4.45
     VERSION: #3 SMP Thu Jun 29 09:01:05 CEST 2017
     MACHINE: x86_64  (2812 Mhz)
      MEMORY: 1.3 GB
       PANIC: ""
         PID: 5186
     COMMAND: "z_zvol"
        TASK: ffff88004f2cf080  [THREAD_INFO: ffff880043848000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 5186   TASK: ffff88004f2cf080  CPU: 0   COMMAND: "z_zvol"
    [exception RIP: kobject_get+9]
    RIP: ffffffff81332a49  RSP: ffff88004384bbb8  RFLAGS: 00000282
    RAX: 0000000000000001  RBX: ffff880042081080  RCX: 0000000000000006
    RDX: 0000000000000006  RSI: ffffffffa03e0228  RDI: ffff880042081080
    RBP: ffffffffa03dff80   R8: 0000000000000002   R9: 0000000000000000
    R10: 0000000000000000  R11: 000000000e600000  R12: 000000000e600000
    R13: ffff88004c783800  R14: ffff880042081000  R15: ffff880042229b80
    CS: 0010  SS: 0018
 #0 [ffff88004384bbc0] get_disk at ffffffff81322422
 #1 [ffff88004384bbd8] exact_lock at ffffffff81322448
 #2 [ffff88004384bbe0] kobj_lookup at ffffffff8144739c
 #3 [ffff88004384bc38] get_gendisk at ffffffff81322818
 #4 [ffff88004384bc58] __blkdev_get at ffffffff8116f2c8
 #5 [ffff88004384bca8] blkdev_get at ffffffff8116f5db
 #6 [ffff88004384bd20] add_disk at ffffffff81322ed3
 #7 [ffff88004384bd78] zvol_create_minor_impl at ffffffffa0345503 [zfs]
 #8 [ffff88004384bdc0] zvol_task_cb at ffffffffa0345c80 [zfs]
 #9 [ffff88004384be38] taskq_thread at ffffffffa00e719e [spl]
#10 [ffff88004384bed8] kthread at ffffffff81077b7a
#11 [ffff88004384bf50] ret_from_fork at ffffffff8171020f
crash> l *(kobject_get+0x09)
0xffffffff81332a49 is in kobject_get (lib/kobject.c:594).
589      * @kobj: object.
590      */
591     struct kobject *kobject_get(struct kobject *kobj)
592     {
593             if (kobj) {
594                     if (!kobj->state_initialized)
595                             WARN(1, KERN_WARNING "kobject: '%s' (%p): is not "
596                                    "initialized, yet kobject_get() is being "
597                                    "called.\n", kobject_name(kobj), kobj);
598                     kref_get(&kobj->kref);
crash> print *(struct kobject*)(0xffff880042081080)
$1 = {
  name = 0xffff88003ec69230 "zd0", 
  entry = {
    next = 0xffff88004c7521e0, 
    prev = 0xffff8800355de018
  }, 
  parent = 0xffff88003e59dea0, 
  kset = 0xffff88004c7521e0, 
  ktype = 0xffffffff81a93880 <device_ktype>, 
  sd = 0xffff88004212d2d0, 
  kref = {
    refcount = {
      counter = 5
    }
  }, 
  state_initialized = 1, 
  state_in_sysfs = 1, 
  state_add_uevent_sent = 0, 
  state_remove_uevent_sent = 0, 
  uevent_suppress = 1
}

I'll look at this and provide more details soon.

@behlendorf behlendorf added the Component: ZVOL ZFS Volumes label Jun 29, 2017
@ab-oe
Copy link
Contributor Author

ab-oe commented Jun 30, 2017

I did further investigation and see strange results. Sometimes it crashes in different places but always stack backtrace has common zvol_create_minor_impl ancestor. In reported crash the instruction pointer was at:

0xffffffff81332a49 <kobject_get+9>:     test   BYTE PTR [rdi+0x3c],0x1 

but the RDI contains correct address to struct kobject.
I can print whole structure and I can also print value from that address correctly.

crash> p/x *(0xffff880042081080+0x3C)
$25 = 0x13

I had another crash dump where the instruction pointer was at:

0xffffffff810b974e <try_module_get+46>: mov    eax,0x1
0xffffffff810b9753 <try_module_get+51>: ret                   <-- here is the RIP
0xffffffff810b9754 <try_module_get+52>: mov    edx,eax

so it should fail only if the RSP points to wrong address, but the address in RSP is correct:

crash> p/a *(0xffff880037f47bc8)
$2 = 0xffffffff81322412 <get_disk+34>

The full backtrace in this case looks as follows:

crash> bt -sx
PID: 22439  TASK: ffff880037ea2d00  CPU: 0   COMMAND: "z_zvol"
    [exception RIP: try_module_get+51]
    RIP: ffffffff810b9753  RSP: ffff880037f47bc8  RFLAGS: 00000246
    RAX: 0000000000000001  RBX: ffff880047cf9000  RCX: 0000000000000006
    RDX: 0000000000000006  RSI: ffffffffa03e0128  RDI: ffffffffa03dfe80
    RBP: ffffffffa03dfe80   R8: 0000000000000002   R9: 0000000000000000
    R10: 0000000000000000  R11: 000000000e600000  R12: 000000000e600000
    R13: ffff88004e783800  R14: ffff880047cf9000  R15: ffff880037fba340
    CS: 0010  SS: 0018
 #0 [ffff880037f47bc8] get_disk+0x22 at ffffffff81322412
 #1 [ffff880037f47be0] exact_lock+0x8 at ffffffff81322448
 #2 [ffff880037f47be8] kobj_lookup+0xdc at ffffffff8144739c
 #3 [ffff880037f47c40] get_gendisk+0x28 at ffffffff81322818
 #4 [ffff880037f47c60] __blkdev_get+0x148 at ffffffff8116f2c8
 #5 [ffff880037f47cb0] blkdev_get+0x3b at ffffffff8116f5db
 #6 [ffff880037f47d28] add_disk+0x3d3 at ffffffff81322ed3
 #7 [ffff880037f47d80] zvol_create_minor_impl+0x5e4 at ffffffffa03453f4 [zfs]
 #8 [ffff880037f47dc8] zvol_task_cb+0x31f at ffffffffa0345b3f [zfs]
 #9 [ffff880037f47e38] taskq_thread+0x24e at ffffffffa00ec19e [spl]
#10 [ffff880037f47ed8] kthread+0xca at ffffffff81077b7a
#11 [ffff880037f47f50] ret_from_fork+0x3f at ffffffff8171020f

The get_disk function calls try_module_get function and then returns to get_disk function again.

crash> dis get_disk
...
0xffffffff8132240d <get_disk+29>:       call   0xffffffff810b9720 <try_module_get>
0xffffffff81322412 <get_disk+34>:       test   al,al

everything looks correct but machine crashed. Do you have any ideas?

I'll do bisection what changes caused this issue and I'll try to reproduce this on different hypervisor.

@ab-oe ab-oe changed the title Zpool import crashes the kvm virtual machine. Zpool import crashes the single CPU machines. Jun 30, 2017
@ab-oe
Copy link
Contributor Author

ab-oe commented Jun 30, 2017

This problem occurs always when we use single CPU machine no matter if this is virtual or real environment.

The system hangs since commit 0778358.
Reverting this commit makes the system stable again. However in the VM core dumps I don't see any relation with this commit. Do you have any clues?

@tuxoko
Copy link
Contributor

tuxoko commented Jun 30, 2017

What does dmesg say?

@behlendorf
Copy link
Contributor

@ab-oe thanks for isolating the exact commit. That makes it clear you're hitting the ERESTARTSYS case in __blkdev_get, which is probably much more likely when running with a single CPU. It seems plausible that this is a kernel bug but it's the first report I've seen on it.

You could try a different kernel and see if the issue is still reproducible. You could also try the current master code with your existing kernel includes to potentially helpful improvements. Getting any additional dmesg output would also be helpful.

97f8d79 Fix zvol_state_t->zv_open_count race
58404a7 Refine use of zv_state_lock.

@ab-oe
Copy link
Contributor Author

ab-oe commented Jul 3, 2017

@tuxoko there is nothing in dmesg, system hangs just like someone cut off the clock source from CPU. I have never seen similar effect before (unless this was a H/W failure). I suspected the hypervisor but I reproduced this also on VMWare and physical single core machine. I can only get core dumps from KVM and it ends with RIP in different places mostly in kobject_get function.

@behlendorf I'll try today with different kernels and if it solves the issue I'll track down the commit that solves it. I already tested version with commits mentioned by you. They didn't help in this case.

@ab-oe
Copy link
Contributor Author

ab-oe commented Jul 3, 2017

Update: I tested the the latest master with longterm 3.10.107, 4.4.75 and 4.9.35 kernels all behaves the same. System hangs just after executing zpool import pool_name command. Unfortunately with the latest kernels (4.11.8 and 4.12) I have different problems on testing machine but I suppose that they'll also crash.

@ab-oe
Copy link
Contributor Author

ab-oe commented Jul 4, 2017

I added schedule before returning ERESTARTSYS in zvol_first_open:

@@ -1265,8 +1265,10 @@ zvol_first_open(zvol_state_t *zv)
         */
        if (!mutex_owned(&spa_namespace_lock)) {
                locked = mutex_tryenter(&spa_namespace_lock);
-               if (!locked)
+               if (!locked) {
+                       schedule();
                        return (-SET_ERROR(ERESTARTSYS));
+               }
        }

The system now remains stable even with hunderds of zvols. With some debug code I noticed that on virtual machine with one zpool with one zvol there are ~1000 retries of zpool_first_open before it grabs the spa_namespace_lock.

@behlendorf
Copy link
Contributor

The system hangs just after zpool import pool_name is executed.

Ahh, @ab-oe that makes sense and matches your symptoms. One cpu spinning is a pretty good way to cause a hang. On a single cpu system there's not much preventing us from potentially spinning here unless we explicitly reschedule when the spa_namespare_lock is held. Could you open a PR with the proposed fix, and thanks for running this down.

@ab-oe
Copy link
Contributor Author

ab-oe commented Jul 5, 2017

@behlendorf thank you. I created PR #6312. I moved schedule outside the zvol_state_lock to avoid possible deadlocks. I consider if there should be check if error == -ERESTARTSYS to reschedule only on this specific error code. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZVOL ZFS Volumes
Projects
None yet
Development

No branches or pull requests

3 participants