arc_adapt : Soft Lockups on ZFS when exporting zvols over iSCSI #2517

edamato · 2014-07-21T12:06:43Z

Hi All,

I'm experiencing soft lockups on ZFS during normal iscsi I/O on the clients. The lockups only happen if I am using zvols exported over iSCSI, and happen more often if zvols are doing lots of I/O. These lockups are normally followed up by the server entering an unresponsive state and having to be rebooted. zpool and zfs commands seem to hang.

Environment:

RHEL6.4, 256GB of RAM, 2 x 4 core Xeon
ZFS version zfs-0.6.2 spl-0.6.2, against RHEL6 kernel 2.6.32-358.el6
zpool uses 4 mirrored vdevs totalling about 50TB
zfs_arc_max is 64G
about 2 zvols in use and exported via iSCSI to clients
They always seem to occur in arc_adapt when trying to shrink its space
after reducing zfs_arc_max to 8GB, the problem hasn't happened again.
Can someone pls confirm (or not) if this is the same as Consistent Kernel Panic: BUG: soft lockup - CPU#8 stuck for 22s! [kswapd1:146] #1579

Examples of the lockups:

Jun 28 05:39:52 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:41:16 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:41:52 nodeB kernel: INFO: task txg_sync:51415 blocked for more than 120 seconds.
Jun 28 05:42:40 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:43:52 nodeB kernel: INFO: task txg_sync:51415 blocked for more than 120 seconds.
Jun 28 05:43:52 nodeB kernel: INFO: task crond:35499 blocked for more than 120 seconds.
Jun 28 05:43:52 nodeB kernel: INFO: task crond:35500 blocked for more than 120 seconds.
Jun 28 05:44:04 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:45:28 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:45:52 nodeB kernel: INFO: task zvol/0:38989 blocked for more than 120 seconds.
Jun 28 05:45:52 nodeB kernel: INFO: task zvol/1:38990 blocked for more than 120 seconds.
Jun 28 05:45:52 nodeB kernel: INFO: task zvol/2:38991 blocked for more than 120 seconds.
Jun 28 05:45:52 nodeB kernel: INFO: task zvol/3:38992 blocked for more than 120 seconds.
Jun 28 05:45:52 nodeB kernel: INFO: task zvol/4:38993 blocked for more than 120 seconds.
Jun 28 05:45:52 nodeB kernel: INFO: task zvol/6:38995 blocked for more than 120 seconds.
Jun 28 05:46:52 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:48:16 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]

Examples of the lockups in detail:

Jun 28 05:39:52 nodeB kernel: BUG: soft lockup - CPU#5 stuck for 67s! [arc_adapt:38986]
Jun 28 05:39:52 nodeB kernel: Modules linked in: zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate autofs4 nfs fscache netconsole configfs nfsd lockd nfs_acl auth_rpcgss exportfs ext3 jbd sunrpc bonding 8021q garp stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio iscsi_tcp libiscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 tcp_highspeed acpi_pad osst st mlx4_ib ib_sa ib_mad ib_core mlx4_core igb dca ptp pps_core ses enclosure sg be2net microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ext4 mbcache jbd2 raid1 sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt isci libsas scsi_transport_sas megaraid_sas ahci wmi dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zlib_deflate]
Jun 28 05:39:52 nodeB kernel: CPU 5 
Jun 28 05:39:52 nodeB kernel: Modules linked in: zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate autofs4 nfs fscache netconsole configfs nfsd lockd nfs_acl auth_rpcgss exportfs ext3 jbd sunrpc bonding 8021q garp stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio iscsi_tcp libiscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 tcp_highspeed acpi_pad osst st mlx4_ib ib_sa ib_mad ib_core mlx4_core igb dca ptp pps_core ses enclosure sg be2net microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ext4 mbcache jbd2 raid1 sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt isci libsas scsi_transport_sas megaraid_sas ahci wmi dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zlib_deflate]
Jun 28 05:39:52 nodeB kernel: 
Jun 28 05:39:52 nodeB kernel: Pid: 38986, comm: arc_adapt Tainted: P           ---------------    2.6.32-358.el6.x86_64 #1 Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+
Jun 28 05:39:52 nodeB kernel: RIP: 0010:[<ffffffff81150b56>]  [<ffffffff81150b56>] remove_vm_area+0x86/0xa0
Jun 28 05:39:52 nodeB kernel: RSP: 0018:ffff88261e0fbc60  EFLAGS: 00000283
Jun 28 05:39:52 nodeB kernel: RAX: ffff8826adc1a340 RBX: ffff88261e0fbc80 RCX: ffff881a7d2c6140
Jun 28 05:39:52 nodeB kernel: RDX: ffff881a7d2c6240 RSI: 0000000000000001 RDI: ffffffff81ac2450
Jun 28 05:39:52 nodeB kernel: RBP: ffffffff8100bb8e R08: ffffea008812d208 R09: 0000000000000000
Jun 28 05:39:52 nodeB kernel: R10: ffff882080440300 R11: 0000000000000000 R12: ffff88261e0fbc30
Jun 28 05:39:52 nodeB kernel: R13: 0000000000000297 R14: ffff882080010dc0 R15: ffffea008812d5d0
Jun 28 05:39:52 nodeB kernel: FS:  0000000000000000(0000) GS:ffff8820f0c80000(0000) knlGS:0000000000000000
Jun 28 05:39:52 nodeB kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jun 28 05:39:52 nodeB kernel: CR2: 000000000269c000 CR3: 0000000001a85000 CR4: 00000000001407e0
Jun 28 05:39:52 nodeB kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 28 05:39:52 nodeB kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 28 05:39:52 nodeB kernel: Process arc_adapt (pid: 38986, threadinfo ffff88261e0fa000, task ffff882642139500)
Jun 28 05:39:52 nodeB kernel: Stack:
Jun 28 05:39:52 nodeB kernel: ffff881e86a77480 ffff8826adc1a340 ffffc91d7a5ba800 ffffc91d727f8000
Jun 28 05:39:52 nodeB kernel: <d> ffff88261e0fbcb0 ffffffff81150c2e ffff881e13005f80 ffff884048eb0000
Jun 28 05:39:52 nodeB kernel: <d> ffffc91d72852000 0000000000000000 ffff88261e0fbcc0 ffffffff81150d9a
Jun 28 05:39:52 nodeB kernel: Call Trace:
Jun 28 05:39:52 nodeB kernel: [<ffffffff81150c2e>] ? __vunmap+0x2e/0x120
Jun 28 05:39:52 nodeB kernel: [<ffffffff81150d9a>] ? vfree+0x2a/0x40
Jun 28 05:39:52 nodeB kernel: [<ffffffffa03281a5>] ? kv_free+0x65/0x70 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0328489>] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0897d11>] ? nfs_access_free_list+0x31/0x40 [nfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09a82c0>] ? zpl_prune_sbs+0x0/0x60 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032869d>] ? spl_kmem_cache_reap_now+0x10d/0x230 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffff8119d0bc>] ? dispose_list+0xfc/0x120
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09a82c0>] ? zpl_prune_sbs+0x0/0x60 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032a92b>] ? __spl_kmem_cache_generic_shrinker+0x4b/0xd0 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032aa07>] ? spl_kmem_reap+0x27/0x30 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09a8318>] ? zpl_prune_sbs+0x58/0x60 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0900920>] ? arc_adjust_meta+0x120/0x1e0 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09009e0>] ? arc_adapt_thread+0x0/0x160 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09009e0>] ? arc_adapt_thread+0x0/0x160 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0900a47>] ? arc_adapt_thread+0x67/0x160 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032c478>] ? thread_generic_wrapper+0x68/0x80 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032c410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Jun 28 05:39:52 nodeB kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Jun 28 05:39:52 nodeB kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Jun 28 05:39:52 nodeB kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Jun 28 05:39:52 nodeB kernel: Code: 50 24 ac 81 48 89 45 e8 e8 a8 f4 3b 00 48 8b 15 b9 6a e6 00 48 c7 c1 f8 75 fb 81 48 8b 45 e8 48 39 d3 74 0c 90 48 89 d1 48 8b 12 <48> 39 d3 75 f5 48 8b 13 48 89 11 f0 81 05 e4 18 97 00 00 00 00 
Jun 28 05:39:52 nodeB kernel: Code: 50 24 ac 81 48 89 45 e8 e8 a8 f4 3b 00 48 8b 15 b9 6a e6 00 48 c7 c1 f8 75 fb 81 48 8b 45 e8 48 39 d3 74 0c 90 48 89 d1 48 8b 12 <48> 39 d3 75 f5 48 8b 13 48 89 11 f0 81 05 e4 18 97 00 00 00 00 
Jun 28 05:39:52 nodeB kernel: Call Trace:
Jun 28 05:39:52 nodeB kernel: [<ffffffff81150c2e>] ? __vunmap+0x2e/0x120
Jun 28 05:39:52 nodeB kernel: [<ffffffff81150d9a>] ? vfree+0x2a/0x40
Jun 28 05:39:52 nodeB kernel: [<ffffffffa03281a5>] ? kv_free+0x65/0x70 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0328489>] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0897d11>] ? nfs_access_free_list+0x31/0x40 [nfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09a82c0>] ? zpl_prune_sbs+0x0/0x60 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032869d>] ? spl_kmem_cache_reap_now+0x10d/0x230 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffff8119d0bc>] ? dispose_list+0xfc/0x120
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09a82c0>] ? zpl_prune_sbs+0x0/0x60 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032a92b>] ? __spl_kmem_cache_generic_shrinker+0x4b/0xd0 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032aa07>] ? spl_kmem_reap+0x27/0x30 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09a8318>] ? zpl_prune_sbs+0x58/0x60 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0900920>] ? arc_adjust_meta+0x120/0x1e0 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09009e0>] ? arc_adapt_thread+0x0/0x160 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa09009e0>] ? arc_adapt_thread+0x0/0x160 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa0900a47>] ? arc_adapt_thread+0x67/0x160 [zfs]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032c478>] ? thread_generic_wrapper+0x68/0x80 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffffa032c410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Jun 28 05:39:52 nodeB kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Jun 28 05:39:52 nodeB kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Jun 28 05:39:52 nodeB kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Jun 28 05:39:52 nodeB kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20



Jun 28 05:45:52 nodeB kernel: INFO: task zvol/0:38989 blocked for more than 120 seconds.
Jun 28 05:45:52 nodeB kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 28 05:45:52 nodeB kernel: zvol/0        D 0000000000000000     0 38989      2 0x00000080
Jun 28 05:45:52 nodeB kernel: ffff8826075b5be0 0000000000000046 0000000000000000 0000000000000000
Jun 28 05:45:52 nodeB kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jun 28 05:45:52 nodeB kernel: ffff884051c225f8 ffff8826075b5fd8 000000000000fb88 ffff884051c225f8
Jun 28 05:45:52 nodeB kernel: Call Trace:
Jun 28 05:45:52 nodeB kernel: [<ffffffff81096e8e>] ? prepare_to_wait_exclusive+0x4e/0x80
Jun 28 05:45:52 nodeB kernel: [<ffffffffa0334475>] cv_wait_common+0x105/0x1c0 [spl]
Jun 28 05:45:52 nodeB kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Jun 28 05:45:52 nodeB kernel: [<ffffffffa0334585>] __cv_wait+0x15/0x20 [spl]
Jun 28 05:45:52 nodeB kernel: [<ffffffffa090eb13>] dmu_buf_hold_array_by_dnode+0x2a3/0x560 [zfs]
Jun 28 05:45:52 nodeB kernel: [<ffffffffa090f055>] dmu_buf_hold_array+0x65/0x90 [zfs]
Jun 28 05:45:52 nodeB kernel: [<ffffffffa090f746>] dmu_read_req+0x56/0x180 [zfs]
Jun 28 05:45:52 nodeB kernel: [<ffffffffa09aa837>] zvol_read+0x67/0xc0 [zfs]
Jun 28 05:45:52 nodeB kernel: [<ffffffffa032d628>] taskq_thread+0x218/0x4b0 [spl]
Jun 28 05:45:52 nodeB kernel: [<ffffffff8150d6e0>] ? thread_return+0x4e/0x76e
Jun 28 05:45:52 nodeB kernel: [<ffffffff81063310>] ? default_wake_function+0x0/0x20
Jun 28 05:45:52 nodeB kernel: [<ffffffffa032d410>] ? taskq_thread+0x0/0x4b0 [spl]
Jun 28 05:45:52 nodeB kernel: [<ffffffff81096916>] kthread+0x96/0xa0
Jun 28 05:45:52 nodeB kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Jun 28 05:45:52 nodeB kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Jun 28 05:45:52 nodeB kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

The text was updated successfully, but these errors were encountered:

pneumatic · 2014-07-24T12:32:59Z

This happens to me on NFS so it isn't specific to zvols.

edamato · 2014-07-25T08:25:48Z

@pneumatic:

Thanks for this, do you have a full soft-lockup message from when it happens to you? What kernel are you using?

cheers

pneumatic · 2014-07-25T12:53:13Z

@edamato:

I'm using the following:

3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64
x86_64 GNU/Linux

I do not have the logs but I remember the "blocked for more than 120
seconds" while the arc_adapt was dumping.

On Fri, Jul 25, 2014 at 4:26 AM, edamato [email protected] wrote:

@pneumatic https://github.com/pneumatic:

Thanks for this, do you have a full soft-lockup message from when it
happens to you? What kernel are you using?

cheers

—
Reply to this email directly or view it on GitHub
#2517 (comment).

olw2005 · 2014-10-02T14:45:20Z

We had a very similar lockup on our main storage system last night which forced a hard reboot. System / software configuration is similar to the above:
HP Proliant 380p Gen 8, 2 x E5-2690 cpu's, 192 GB RAM
RHEL 6.5 (kernel 2.6.32-431.20.3.el6.x86_64)
SCST iscsi target 3.0
drbd 8.4.5
3 zvol(s)
zfs 0.6.3-1 (latest release version)

zpool config:
3 x 10-disk raidz2 vdevs + 5 x 10GB ssd log devs + 5 x 500GB ssd cache devices

zfs modules options:
options zfs zfs_nocacheflush=1 zfs_arc_max=103079215104 zfs_arc_min=1073741824

ZPool / zvols have default values other than lz4 compression=on, 128kB blocksize, dedup / checksum / atime / xattr are all off.

Uptime was about 90 days, so memory usage / fragmentation was a little on the high side but had been stable for weeks with at least 5-10 GB free RAM and the system was operating normally. The lockup occurred at 18:28 EST. Prior to that the system had spent the bulk of the afternoon under normal (light) load and (this part is probably key) freeing (via scsi unmap commands from VMware ESXi 5.1) approximately 600 GB of data from a zvol with 23T logical referenced data. The space freeing (at least from VMware's perspective, not sure if this is a sync or async option in zfs?) had completed about a half hour before (~ 1800 hrs). Not long afterward the system stopped responding to I/O requests from VMware and we got this in /var/log/messages for an hour when I noticed the problem and hard reset the system (after it failed to shutdown cleanly):

Oct 1 18:27:11 dtc-san kernel: BUG: soft lockup - CPU#4 stuck for 67s! [zvol/30:2080]
Oct 1 18:27:11 dtc-san kernel: Modules linked in: xfs exportfs fuse iscsi_scst(U) scst_vdisk(U) scst(U) crc32c_intel drbd(U) mptctl mptbase bonding 8021q garp stp llc ipv6 uinput iTCO_wdt iTCO_ven
dor_support microcode hpilo zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate hpwdt qlcnic power_meter bnx2x libcrc32c mdio tg3 ptp pps_core serio_raw sg lpc_ich m
fd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd]
Oct 1 18:27:11 dtc-san kernel: CPU 4
Oct 1 18:27:11 dtc-san kernel: Modules linked in: xfs exportfs fuse iscsi_scst(U) scst_vdisk(U) scst(U) crc32c_intel drbd(U) mptctl mptbase bonding 8021q garp stp llc ipv6 uinput iTCO_wdt iTCO_ven
dor_support microcode hpilo zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate hpwdt qlcnic power_meter bnx2x libcrc32c mdio tg3 ptp pps_core serio_raw sg lpc_ich m
fd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd]
Oct 1 18:27:11 dtc-san kernel:
Oct 1 18:27:11 dtc-san kernel: Pid: 2080, comm: zvol/30 Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1 HP ProLiant DL380p Gen8
Oct 1 18:27:11 dtc-san kernel: RIP: 0010:[] [] _spin_lock+0x1e/0x30
Oct 1 18:27:11 dtc-san kernel: RSP: 0018:ffff881818b55ba0 EFLAGS: 00000202
Oct 1 18:27:11 dtc-san kernel: RAX: 0000000000000000 RBX: ffff881818b55ba0 RCX: 0000000000000000
Oct 1 18:27:11 dtc-san kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff88172f1c9984
Oct 1 18:27:11 dtc-san kernel: RBP: ffffffff8100bb8e R08: 0000000000000000 R09: ffffc91c45969000
Oct 1 18:27:11 dtc-san kernel: R10: ffff881818b55b00 R11: ffff8830189980b0 R12: 0000000000000000
Oct 1 18:27:11 dtc-san kernel: R13: ffff883000000040 R14: 0000000000000000 R15: ffff883018998048
Oct 1 18:27:11 dtc-san kernel: FS: 0000000000000000(0000) GS:ffff880094880000(0000) knlGS:0000000000000000
Oct 1 18:27:11 dtc-san kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Oct 1 18:27:11 dtc-san kernel: CR2: 000000000044fc20 CR3: 0000002fda5e4000 CR4: 00000000000407e0
Oct 1 18:27:11 dtc-san kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 1 18:27:11 dtc-san kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 1 18:27:11 dtc-san kernel: Process zvol/30 (pid: 2080, threadinfo ffff881818b54000, task ffff881818b4c080)
Oct 1 18:27:11 dtc-san kernel: Stack:
Oct 1 18:27:11 dtc-san kernel: ffff881818b55bc0 ffffffff81529b95 ffff88172f1c9928 0000000000000029
Oct 1 18:27:11 dtc-san kernel: ffff881818b55bd0 ffffffff81529b6b ffff881818b55c90 ffffffffa033ed51
Oct 1 18:27:11 dtc-san kernel: ffff8810f475cfa8 ffff8808dd707c38 ffff881818b55c10 ffffffff8100b9ce
Oct 1 18:27:11 dtc-san kernel: Call Trace:
Oct 1 18:27:11 dtc-san kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Oct 1 18:27:11 dtc-san kernel: [] ? mutex_unlock+0x1b/0x20
Oct 1 18:27:11 dtc-san kernel: [] ? dbuf_read+0x3c1/0x740 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? common_interrupt+0xe/0x13
Oct 1 18:27:11 dtc-san kernel: [] ? dmu_buf_will_dirty+0xa4/0x100 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? dmu_write_req+0xe0/0x1d0 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? zvol_write+0xd5/0x480 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:27:11 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:27:11 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:27:11 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:27:11 dtc-san kernel: [] ? kthread+0x96/0xa0
Oct 1 18:27:11 dtc-san kernel: [] ? child_rip+0xa/0x20
Oct 1 18:27:11 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:27:11 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:27:11 dtc-san kernel: Code: 00 00 00 01 74 05 e8 82 3f d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f
4 eb df c9 c3 0f 1f 40 00 55 48 89
Oct 1 18:27:11 dtc-san kernel: Call Trace:
Oct 1 18:27:11 dtc-san kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Oct 1 18:27:11 dtc-san kernel: [] ? mutex_unlock+0x1b/0x20
Oct 1 18:27:11 dtc-san kernel: [] ? dbuf_read+0x3c1/0x740 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? common_interrupt+0xe/0x13
Oct 1 18:27:11 dtc-san kernel: [] ? dmu_buf_will_dirty+0xa4/0x100 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? dmu_write_req+0xe0/0x1d0 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? zvol_write+0xd5/0x480 [zfs]
Oct 1 18:27:11 dtc-san kernel: [] ? taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:27:11 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:27:11 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:27:11 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:27:11 dtc-san kernel: [] ? kthread+0x96/0xa0
Oct 1 18:27:11 dtc-san kernel: [] ? child_rip+0xa/0x20
Oct 1 18:27:11 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:27:11 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:27:15 dtc-san kernel: block drbd0: Remote failed to finish a request within ko-count * timeout
Oct 1 18:27:15 dtc-san kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Oct 1 18:27:15 dtc-san kernel: block drbd0: new current UUID 7386D999EC6DCB17:CB1E44CBB2E01213:2DD1E5BEE0BB86E7:2DD0E5BEE0BB86E7
Oct 1 18:27:15 dtc-san kernel: drbd detroitzvol: asender terminated
Oct 1 18:27:15 dtc-san kernel: drbd detroitzvol: Terminating drbd_a_detroitz
Oct 1 18:27:16 dtc-san kernel: block drbd1: Remote failed to finish a request within ko-count * timeout
Oct 1 18:27:16 dtc-san kernel: block drbd1: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Oct 1 18:27:16 dtc-san kernel: block drbd1: new current UUID BBF9EA1D0C1DEFA9:E45FE9B316E0540B:37A301A27DB61415:37A201A27DB61415
Oct 1 18:27:16 dtc-san kernel: drbd archivezvol: asender terminated
Oct 1 18:27:16 dtc-san kernel: drbd archivezvol: Terminating drbd_a_archivez
Oct 1 18:27:17 dtc-san kernel: block drbd2: Remote failed to finish a request within ko-count * timeout
Oct 1 18:27:17 dtc-san kernel: block drbd2: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Oct 1 18:27:17 dtc-san kernel: block drbd2: new current UUID F0CC77D376A5E793:5F7D225F6336A9B7:ED8CD494DC1BB187:ED8BD494DC1BB187
Oct 1 18:27:17 dtc-san kernel: drbd vdistorezvol: asender terminated
Oct 1 18:27:17 dtc-san kernel: drbd vdistorezvol: Terminating drbd_a_vdistore
Oct 1 18:28:35 dtc-san kernel: BUG: soft lockup - CPU#4 stuck for 68s! [zvol/30:2080]
Oct 1 18:28:35 dtc-san kernel: Modules linked in: bridge xfs exportfs fuse iscsi_scst(U) scst_vdisk(U) scst(U) crc32c_intel drbd(U) mptctl mptbase bonding 8021q garp stp llc ipv6 uinput iTCO_wdt iTCO_vendor_support microcode hpilo zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate hpwdt qlcnic power_meter bnx2x libcrc32c mdio tg3 ptp pps_core serio_raw sg lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd]
Oct 1 18:28:35 dtc-san kernel: CPU 4
Oct 1 18:28:35 dtc-san kernel: Modules linked in: bridge xfs exportfs fuse iscsi_scst(U) scst_vdisk(U) scst(U) crc32c_intel drbd(U) mptctl mptbase bonding 8021q garp stp llc ipv6 uinput iTCO_wdt iTCO_vendor_support microcode hpilo zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate hpwdt qlcnic power_meter bnx2x libcrc32c mdio tg3 ptp pps_core serio_raw sg lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd]
Oct 1 18:28:35 dtc-san kernel:
Oct 1 18:28:35 dtc-san kernel: Pid: 2080, comm: zvol/30 Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1 HP ProLiant DL380p Gen8
Oct 1 18:28:35 dtc-san kernel: RIP: 0010:[] [] _spin_lock+0x1e/0x30
Oct 1 18:28:35 dtc-san kernel: RSP: 0018:ffff881818b55ba0 EFLAGS: 00000202
Oct 1 18:28:35 dtc-san kernel: RAX: 0000000000000000 RBX: ffff881818b55ba0 RCX: 0000000000000000
Oct 1 18:28:35 dtc-san kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff88172f1c9984
Oct 1 18:28:35 dtc-san kernel: RBP: ffffffff8100bb8e R08: 0000000000000000 R09: ffffc91c45969000
Oct 1 18:28:35 dtc-san kernel: R10: ffff881818b55b00 R11: ffff8830189980b0 R12: 0000000000000000
Oct 1 18:28:35 dtc-san kernel: R13: ffff883000000040 R14: 0000000000000000 R15: ffff883018998048
Oct 1 18:28:35 dtc-san kernel: FS: 0000000000000000(0000) GS:ffff880094880000(0000) knlGS:0000000000000000
Oct 1 18:28:35 dtc-san kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Oct 1 18:28:35 dtc-san kernel: CR2: 000000000044fc20 CR3: 0000002fda5e4000 CR4: 00000000000407e0
Oct 1 18:28:35 dtc-san kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 1 18:28:35 dtc-san kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 1 18:28:35 dtc-san kernel: Process zvol/30 (pid: 2080, threadinfo ffff881818b54000, task ffff881818b4c080)
Oct 1 18:28:35 dtc-san kernel: Stack:
Oct 1 18:28:35 dtc-san kernel: ffff881818b55bc0 ffffffff81529b95 ffff88172f1c9928 0000000000000029
Oct 1 18:28:35 dtc-san kernel: ffff881818b55bd0 ffffffff81529b6b ffff881818b55c90 ffffffffa033ed51
Oct 1 18:28:35 dtc-san kernel: ffff8810f475cfa8 ffff8808dd707c38 ffff881818b55c10 ffffffff8100b9ce
Oct 1 18:28:35 dtc-san kernel: Call Trace:
Oct 1 18:28:35 dtc-san kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Oct 1 18:28:35 dtc-san kernel: [] ? mutex_unlock+0x1b/0x20
Oct 1 18:28:35 dtc-san kernel: [] ? dbuf_read+0x3c1/0x740 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? common_interrupt+0xe/0x13
Oct 1 18:28:35 dtc-san kernel: [] ? dmu_buf_will_dirty+0xa4/0x100 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? dmu_write_req+0xe0/0x1d0 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? zvol_write+0xd5/0x480 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:28:35 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:28:35 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:28:35 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:28:35 dtc-san kernel: [] ? kthread+0x96/0xa0
Oct 1 18:28:35 dtc-san kernel: [] ? child_rip+0xa/0x20
Oct 1 18:28:35 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:28:35 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:28:35 dtc-san kernel: Code: 00 00 00 01 74 05 e8 82 3f d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89
Oct 1 18:28:35 dtc-san kernel: Call Trace:
Oct 1 18:28:35 dtc-san kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Oct 1 18:28:35 dtc-san kernel: [] ? mutex_unlock+0x1b/0x20
Oct 1 18:28:35 dtc-san kernel: [] ? dbuf_read+0x3c1/0x740 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? common_interrupt+0xe/0x13
Oct 1 18:28:35 dtc-san kernel: [] ? dmu_buf_will_dirty+0xa4/0x100 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? dmu_write_req+0xe0/0x1d0 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? zvol_write+0xd5/0x480 [zfs]
Oct 1 18:28:35 dtc-san kernel: [] ? taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:28:35 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:28:35 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:28:35 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:28:35 dtc-san kernel: [] ? kthread+0x96/0xa0
Oct 1 18:28:35 dtc-san kernel: [] ? child_rip+0xa/0x20
Oct 1 18:28:35 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:28:35 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task arc_adapt:2015 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: arc_adapt D 0000000000000002 0 2015 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff881818cf9d30 0000000000000046 ffff881818cf9cf8 ffff881818cf9cf4
Oct 1 18:29:25 dtc-san kernel: 0000000118cf9cd0 ffff88183fe70c00 ffff880094896840 0000000000000081
Oct 1 18:29:25 dtc-san kernel: ffff881819e485f8 ffff881818cf9fd8 000000000000fbc8 ffff881819e485f8
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] __mutex_lock_slowpath+0x13e/0x180
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] mutex_lock+0x2b/0x50
Oct 1 18:29:25 dtc-san kernel: [] dbuf_destroy+0xc5/0x150 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] dbuf_evict+0x1e/0x30 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] dbuf_do_evict+0x50/0x80 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] arc_do_user_evicts+0x50/0xd0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] arc_adapt_thread+0x4c/0x3c0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? kmem_free_debug+0x4b/0x150 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? set_user_nice+0xc9/0x130
Oct 1 18:29:25 dtc-san kernel: [] ? arc_adapt_thread+0x0/0x3c0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/0:2050 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/0 D 0000000000000014 0 2050 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189a9bd0 0000000000000046 ffff8818189a9b98 ffff8818189a9b94
Oct 1 18:29:25 dtc-san kernel: ffff88173865a068 ffff88183fe73000 ffff880094896840 0000000000000207
Oct 1 18:29:25 dtc-san kernel: ffff88181899e638 ffff8818189a9fd8 000000000000fbc8 ffff88181899e638
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/1:2051 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/1 D 0000000000000016 0 2051 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189adbd0 0000000000000046 ffff8818189adb98 ffff8818189adb94
Oct 1 18:29:25 dtc-san kernel: ffff8818189adb70 ffff88183fe73400 ffff880094896840 0000000000000080
Oct 1 18:29:25 dtc-san kernel: ffff8818189abab8 ffff8818189adfd8 000000000000fbc8 ffff8818189abab8
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/2:2052 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/2 D 0000000000000010 0 2052 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189b1bd0 0000000000000046 ffff8818189b1b98 ffff8818189b1b94
Oct 1 18:29:25 dtc-san kernel: ffff8808abf47a40 ffff88183fe72800 ffff880094896840 0000000000000101
Oct 1 18:29:25 dtc-san kernel: ffff8818189ab058 ffff8818189b1fd8 000000000000fbc8 ffff8818189ab058
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/3:2053 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/3 D 0000000000000000 0 2053 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189b3bd0 0000000000000046 ffff8818189b3b98 ffff8818189b3b94
Oct 1 18:29:25 dtc-san kernel: ffff88173865a068 ffff88183fe70800 ffff880094896840 0000000000000101
Oct 1 18:29:25 dtc-san kernel: ffff8818189aa5f8 ffff8818189b3fd8 000000000000fbc8 ffff8818189aa5f8
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/4:2054 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/4 D 0000000000000014 0 2054 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189b7bd0 0000000000000046 ffff8818189b7b98 ffff8818189b7b94
Oct 1 18:29:25 dtc-san kernel: ffff8818189b7b70 ffff88183fe73000 ffff880094896840 0000000000000207
Oct 1 18:29:25 dtc-san kernel: ffff8818189b5af8 ffff8818189b7fd8 000000000000fbc8 ffff8818189b5af8
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/5:2055 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/5 D 0000000000000015 0 2055 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189bbbd0 0000000000000046 ffff8818189bbb98 ffff8818189bbb94
Oct 1 18:29:25 dtc-san kernel: ffff881750223928 ffff88183fe73200 ffff880094896840 0000000000000101
Oct 1 18:29:25 dtc-san kernel: ffff8818189b5098 ffff8818189bbfd8 000000000000fbc8 ffff8818189b5098
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/6:2056 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/6 D 0000000000000013 0 2056 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff8818189bdbd0 0000000000000046 ffff8818189bdb98 ffff8818189bdb94
Oct 1 18:29:25 dtc-san kernel: ffff8818189bdb70 ffff88183fe72e00 ffff880094896840 0000000000000080
Oct 1 18:29:25 dtc-san kernel: ffff8818189b4638 ffff8818189bdfd8 000000000000fbc8 ffff8818189b4638
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/7:2057 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/7 D 0000000000000012 0 2057 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff881813dc3bd0 0000000000000046 ffff881813dc3b98 ffff881813dc3b94
Oct 1 18:29:25 dtc-san kernel: ffff88174a352068 ffff88183fe72c00 ffff880094896840 0000000000000102
Oct 1 18:29:25 dtc-san kernel: ffff881813dc1ab8 ffff881813dc3fd8 000000000000fbc8 ffff881813dc1ab8
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: INFO: task zvol/8:2058 blocked for more than 120 seconds.
Oct 1 18:29:25 dtc-san kernel: Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1
Oct 1 18:29:25 dtc-san kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 1 18:29:25 dtc-san kernel: zvol/8 D 0000000000000010 0 2058 2 0x00000000
Oct 1 18:29:25 dtc-san kernel: ffff881813dc7bd0 0000000000000046 ffff881813dc7b98 ffff881813dc7b94
Oct 1 18:29:25 dtc-san kernel: ffff881700000000 ffff88183fe72800 ffff880094896840 0000000000000080
Oct 1 18:29:25 dtc-san kernel: ffff881813dc1058 ffff881813dc7fd8 000000000000fbc8 ffff881813dc1058
Oct 1 18:29:25 dtc-san kernel: Call Trace:
Oct 1 18:29:25 dtc-san kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 1 18:29:25 dtc-san kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_wait+0x174/0x2b0 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? mutex_lock+0x1e/0x50
Oct 1 18:29:25 dtc-san kernel: [] dmu_tx_assign+0x91/0x490 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] zvol_write+0xb7/0x480 [zfs]
Oct 1 18:29:25 dtc-san kernel: [] taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:25 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:25 dtc-san kernel: [] kthread+0x96/0xa0
Oct 1 18:29:25 dtc-san kernel: [] child_rip+0xa/0x20
Oct 1 18:29:25 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:25 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:59 dtc-san kernel: BUG: soft lockup - CPU#4 stuck for 67s! [zvol/30:2080]
Oct 1 18:29:59 dtc-san kernel: Modules linked in: bridge xfs exportfs fuse iscsi_scst(U) scst_vdisk(U) scst(U) crc32c_intel drbd(U) mptctl mptbase bonding 8021q garp stp llc ipv6 uinput iTCO_wdt iTCO_vendor_support microcode hpilo zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate hpwdt qlcnic power_meter bnx2x libcrc32c mdio tg3 ptp pps_core serio_raw sg lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd]
Oct 1 18:29:59 dtc-san kernel: CPU 4
Oct 1 18:29:59 dtc-san kernel: Modules linked in: bridge xfs exportfs fuse iscsi_scst(U) scst_vdisk(U) scst(U) crc32c_intel drbd(U) mptctl mptbase bonding 8021q garp stp llc ipv6 uinput iTCO_wdt iTCO_vendor_support microcode hpilo zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate hpwdt qlcnic power_meter bnx2x libcrc32c mdio tg3 ptp pps_core serio_raw sg lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd]
Oct 1 18:29:59 dtc-san kernel:
Oct 1 18:29:59 dtc-san kernel: Pid: 2080, comm: zvol/30 Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1 HP ProLiant DL380p Gen8
Oct 1 18:29:59 dtc-san kernel: RIP: 0010:[] [] _spin_lock+0x21/0x30
Oct 1 18:29:59 dtc-san kernel: RSP: 0018:ffff881818b55ba0 EFLAGS: 00000202
Oct 1 18:29:59 dtc-san kernel: RAX: 0000000000000000 RBX: ffff881818b55ba0 RCX: 0000000000000000
Oct 1 18:29:59 dtc-san kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff88172f1c9984
Oct 1 18:29:59 dtc-san kernel: RBP: ffffffff8100bb8e R08: 0000000000000000 R09: ffffc91c45969000
Oct 1 18:29:59 dtc-san kernel: R10: ffff881818b55b00 R11: ffff8830189980b0 R12: 0000000000000000
Oct 1 18:29:59 dtc-san kernel: R13: ffff883000000040 R14: 0000000000000000 R15: ffff883018998048
Oct 1 18:29:59 dtc-san kernel: FS: 0000000000000000(0000) GS:ffff880094880000(0000) knlGS:0000000000000000
Oct 1 18:29:59 dtc-san kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Oct 1 18:29:59 dtc-san kernel: CR2: 000000000044fc20 CR3: 0000000001a85000 CR4: 00000000000407e0
Oct 1 18:29:59 dtc-san kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 1 18:29:59 dtc-san kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 1 18:29:59 dtc-san kernel: Process zvol/30 (pid: 2080, threadinfo ffff881818b54000, task ffff881818b4c080)
Oct 1 18:29:59 dtc-san kernel: Stack:
Oct 1 18:29:59 dtc-san kernel: ffff881818b55bc0 ffffffff81529b95 ffff88172f1c9928 0000000000000029
Oct 1 18:29:59 dtc-san kernel: ffff881818b55bd0 ffffffff81529b6b ffff881818b55c90 ffffffffa033ed51
Oct 1 18:29:59 dtc-san kernel: ffff8810f475cfa8 ffff8808dd707c38 ffff881818b55c10 ffffffff8100b9ce
Oct 1 18:29:59 dtc-san kernel: Call Trace:
Oct 1 18:29:59 dtc-san kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Oct 1 18:29:59 dtc-san kernel: [] ? mutex_unlock+0x1b/0x20
Oct 1 18:29:59 dtc-san kernel: [] ? dbuf_read+0x3c1/0x740 [zfs]
Oct 1 18:29:59 dtc-san kernel: [] ? common_interrupt+0xe/0x13
Oct 1 18:29:59 dtc-san kernel: [] ? dmu_buf_will_dirty+0xa4/0x100 [zfs]
Oct 1 18:29:59 dtc-san kernel: [] ? dmu_write_req+0xe0/0x1d0 [zfs]
Oct 1 18:29:59 dtc-san kernel: [] ? zvol_write+0xd5/0x480 [zfs]
Oct 1 18:29:59 dtc-san kernel: [] ? taskq_thread+0x218/0x4b0 [spl]
Oct 1 18:29:59 dtc-san kernel: [] ? thread_return+0x4e/0x760
Oct 1 18:29:59 dtc-san kernel: [] ? default_wake_function+0x0/0x20
Oct 1 18:29:59 dtc-san kernel: [] ? taskq_thread+0x0/0x4b0 [spl]
Oct 1 18:29:59 dtc-san kernel: [] ? kthread+0x96/0xa0
Oct 1 18:29:59 dtc-san kernel: [] ? child_rip+0xa/0x20
Oct 1 18:29:59 dtc-san kernel: [] ? kthread+0x0/0xa0
Oct 1 18:29:59 dtc-san kernel: [] ? child_rip+0x0/0x20
Oct 1 18:29:59 dtc-san kernel: Code: 01 74 05 e8 82 3f d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f
Oct 1 18:29:59 dtc-san kernel: Call Trace:
Oct 1 18:29:59 dtc-san kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Oct 1 18:29:59 dtc-san kernel: [] ? mutex_unlock+0x1b/0x20
Oct 1 18:29:59 dtc-san kernel: [] ? dbuf_read+0x3c1/0x740 [zfs]
Oct 1 18:29:59 dtc-san kernel: [] ? common_interrupt+0xe/0x13
Oct 1 18:29:59 dtc-san kernel: [] ? dmu_buf_will_dirty+0xa4/0x100 [zfs]
Oct 1 18:29:59 dtc-san kernel: [

olw2005 · 2014-10-02T15:04:50Z

Could there be a common thread in these lockups? Most of the stack trace business is Greek to me, but this line from the original poster's trace looks related to a SCSI unmap command (which my system was also doing a lot of shortly before going down in flames):

Jun 28 05:39:52 nodeB kernel: [] ? __vunmap+0x2e/0x120

olw2005 · 2014-10-02T16:21:16Z

Could this be another instance of issue #2523 ?

behlendorf · 2014-10-02T23:17:48Z

I think there are a couple issues getting conflated in this bug. The backtraces from the original post look like memory contention in __uunmap, a newer kernel might improve things there. The second set of stack do look like they might be related to #2523.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #421

behlendorf · 2014-12-19T19:01:49Z

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

Commit: openzfs/zfs@a3c1eb7 From: Chunwei Chen <[email protected]> Date: Fri, 19 Dec 2014 11:31:59 +0800 Subject: mutex: force serialization on mutex_exit() to fix races It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Backported-by: Darik Horn <[email protected]> Closes #421 Conflicts: include/sys/mutex.h

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #421

ryao · 2015-02-10T18:41:20Z

@behlendorf This is not a duplicate of #2523. Instead, it is a duplicate of #3091.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#421 Conflicts: include/sys/mutex.h

behlendorf added this to the 0.7.0 milestone Jul 21, 2014

behlendorf added the Bug label Jul 21, 2014

behlendorf modified the milestones: 0.6.4, 0.7.0 Dec 19, 2014

behlendorf added Bug - Blocker and removed Bug labels Dec 19, 2014

behlendorf closed this as completed Dec 19, 2014

ryao mentioned this issue Feb 10, 2015

ZFS lockup 0.6.3.1 #3091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arc_adapt : Soft Lockups on ZFS when exporting zvols over iSCSI #2517

arc_adapt : Soft Lockups on ZFS when exporting zvols over iSCSI #2517

edamato commented Jul 21, 2014

pneumatic commented Jul 24, 2014

edamato commented Jul 25, 2014

pneumatic commented Jul 25, 2014

olw2005 commented Oct 2, 2014

olw2005 commented Oct 2, 2014

olw2005 commented Oct 2, 2014

behlendorf commented Oct 2, 2014

behlendorf commented Dec 19, 2014

ryao commented Feb 10, 2015

arc_adapt : Soft Lockups on ZFS when exporting zvols over iSCSI #2517

arc_adapt : Soft Lockups on ZFS when exporting zvols over iSCSI #2517

Comments

edamato commented Jul 21, 2014

pneumatic commented Jul 24, 2014

edamato commented Jul 25, 2014

pneumatic commented Jul 25, 2014

olw2005 commented Oct 2, 2014

olw2005 commented Oct 2, 2014

olw2005 commented Oct 2, 2014

behlendorf commented Oct 2, 2014

behlendorf commented Dec 19, 2014

ryao commented Feb 10, 2015