txg_sync at 100% CPU #596

rssalerno · 2012-03-12T00:39:34Z

I'm building a replacement server for a FreeNAS box. It's based on Ubuntu server 10.04 64 bit with ZFS built from ppa:zfs-native/stable. I'm using rsync to copy data from an NFS share mounted on the Ubuntu box. Data consists of video collection comprising 200+ GB of large media files. The rsync job runs for several hours then hangs, top shows txg_sync at 100% CPU. Can't kill the process, can't unmount the pool, can't soft reboot. Machine requires power cycle to reboot.

kern.log:

Mar 11 06:39:23 neptune kernel: [50951.201236] BUG: soft lockup - CPU#1 stuck for 61s! [txg_sync:2089]
Mar 11 06:39:23 neptune kernel: [50951.201253] Modules linked in: appletalk ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_amd kvm snd_hda_codec_atihdmi fbcon tileblit font bitblit softcursor vga16fb vgastate zfs(P) zcommon(P) zunicode(P) znvpair(P) zavl(P) snd_hda_codec_realtek radeon ttm drm_kms_helper spl zlib_deflate ppdev snd_hda_intel snd_hda_codec snd_hwdep drm i2c_algo_bit parport_pc snd_pcm snd_timer lp snd soundcore snd_page_alloc i2c_piix4 edac_core edac_mce_amd shpchp parport ohci1394 ieee1394 r8169 mii pata_atiixp ahci
Mar 11 06:39:23 neptune kernel: [50951.201253] CPU 1:
Mar 11 06:39:23 neptune kernel: [50951.201253] Modules linked in: appletalk ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_amd kvm snd_hda_codec_atihdmi fbcon tileblit font bitblit softcursor vga16fb vgastate zfs(P) zcommon(P) zunicode(P) znvpair(P) zavl(P) snd_hda_codec_realtek radeon ttm drm_kms_helper spl zlib_deflate ppdev snd_hda_intel snd_hda_codec snd_hwdep drm i2c_algo_bit parport_pc snd_pcm snd_timer lp snd soundcore snd_page_alloc i2c_piix4 edac_core edac_mce_amd shpchp parport ohci1394 ieee1394 r8169 mii pata_atiixp ahci
Mar 11 06:39:23 neptune kernel: [50951.201253] Pid: 2089, comm: txg_sync Tainted: P 2.6.32-38-server #83-Ubuntu GA-MA785GM-US2H
Mar 11 06:39:23 neptune kernel: [50951.201253] RIP: 0010:[] [] __ticket_spin_lock+0x19/0x20
Mar 11 06:39:23 neptune kernel: [50951.201253] RSP: 0018:ffff880118cb3b60 EFLAGS: 00000202
Mar 11 06:39:23 neptune kernel: [50951.201253] RAX: 0000000000000000 RBX: ffff880118cb3b60 RCX: ffff8801288af8e8
Mar 11 06:39:23 neptune kernel: [50951.201253] RDX: 000000000000001c RSI: 0000000000000070 RDI: ffffc9002d9894c4
Mar 11 06:39:23 neptune kernel: [50951.201253] RBP: ffffffff81013c6e R08: 0000000000000000 R09: 000000000010a004
Mar 11 06:39:23 neptune kernel: [50951.201253] R10: ffffc9002d9891b8 R11: 0000000000000000 R12: ffff880118cb3ae0
Mar 11 06:39:23 neptune kernel: [50951.201253] R13: ffffffff8108c135 R14: ffff880118cb3af0 R15: ffffffff81019e29
Mar 11 06:39:23 neptune kernel: [50951.201253] FS: 00007f73bd089700(0000) GS:ffff880005280000(0000) knlGS:0000000000000000
Mar 11 06:39:23 neptune kernel: [50951.201253] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Mar 11 06:39:23 neptune kernel: [50951.201253] CR2: 00007f73b630125c CR3: 0000000001001000 CR4: 00000000000006e0
Mar 11 06:39:23 neptune kernel: [50951.201253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 11 06:39:23 neptune kernel: [50951.201253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 11 06:39:23 neptune kernel: [50951.201253] Call Trace:
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? _spin_lock+0xe/0x20
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? __mutex_unlock_slowpath+0x25/0x60
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? mutex_unlock+0x1b/0x20
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? zio_add_child+0x100/0x120 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? zio_create+0x3dc/0x4a0 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? zio_free_sync+0x76/0x80 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? spa_free_sync_cb+0x43/0x60 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? spa_free_sync_cb+0x0/0x60 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? bplist_iterate+0x7b/0xb0 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? spa_sync+0x3cc/0x9a0 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? autoremove_wake_function+0x16/0x40
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? __wake_up+0x53/0x70
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? txg_sync_thread+0x225/0x3b0 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? txg_sync_thread+0x0/0x3b0 [zfs]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? thread_generic_wrapper+0x68/0x80 [spl]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? kthread+0x96/0xa0
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? child_rip+0xa/0x20
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? kthread+0x0/0xa0
Mar 11 06:39:23 neptune kernel: [50951.201253] [] ? child_rip+0x0/0x20

behlendorf · 2012-03-12T17:13:43Z

There is a fix/workaround for this issue in the daily ppa, issue #496. You'll want to run that for a little while until the next release candidate is tagged (which should be fairly soon). Currently the daily releases contains numbers bug fixes not in the stable ppa. Please let me know if this does resolve it the issue, it should.

rssalerno · 2012-03-16T00:27:15Z

I rebuilt my server from scratch and used the daily ppa. Importing the raidz pool from the prior version worked perfectly (this was really nice) so the restore picked up right where it left off.

I experienced an unrelated crash after starting the rsync job for the first time that caused every terminal session including the console become unresponsive (hard reboot). I believe it has to do with SLUB as referenced here: http://nylug.org/pipermail/nylug-talk/2010-July/014371.html

Mar 15 00:20:11 neptune kernel: [ 3152.241264] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
Mar 15 00:20:11 neptune kernel: [ 3152.268658] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
Mar 15 00:26:26 neptune kernel: [ 3527.296216] SLUB: Unable to allocate memory on node -1 (gfp=0x20)

Back on point, the rsync job's been running for the last 24 hours - so far so good. Thanks for all your fine work.

pepevel · 2012-08-20T21:33:08Z

Hello,
If somebody is interested, I installed the latest zfs modules on an ubuntu 12.04 (kernel 3.2.0-29) and I also have this tgx_sync issue doing a simple

zpool create pool1 /dev/sda4
zfs create pool1/w7

I tried both the daily builds and behlendorf's fork.
Without much inner knowledge of the code could it be related to the fact that I have a SSD hard drive?
I tried also creating a pool in a ramdisk

zpool create pool1 /dev/ram1
zfs create pool1/w7

and led to the same results.

mcr-ksh · 2012-09-04T19:29:08Z

me too! i just posted a kernel panic yesterday. I think it is related to here.

5268 root 0 -20 0 0 0 D 40.8 0.0 7:21.33 txg_sync

devices blocked an irresponsive.

5268 root 0 -20 0 0 0 D 40.8 0.0 7:21.33 txg_sync
6223 root 0 -20 0 0 0 D 13.3 0.0 5:50.27 loop0
6477 root 0 -20 0 0 0 D 12.9 0.0 5:52.36 loop2
6732 root 0 -20 0 0 0 D 12.9 0.0 5:50.24 loop4

calmh · 2012-10-09T08:32:28Z

Same here, txg_sync hangs on zfs create, Ubuntu 12.4 under kvm (on SmartOS, ironically);

top - 10:30:34 up 2 min,  2 users,  load average: 2.46, 1.05, 0.40
Tasks: 167 total,   4 running, 163 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 75.1%sy,  0.0%ni, 24.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1025308k total,   111208k used,   914100k free,    16048k buffers
Swap:  1046524k total,        0k used,  1046524k free,    44092k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                             
 1172 root       0 -20     0    0    0 D 68.4  0.0   1:04.85 txg_sync                                                                                                             
 1203 root      20   0  6344 1304 1000 R 19.9  0.1   0:19.25 zfs                                                                                                                  
 1120 root      39  19     0    0    0 R  6.3  0.0   0:05.81 z_null_iss/0                                                                                                         
 1170 root       0 -20     0    0    0 R  5.0  0.0   0:04.88 txg_quiesce                                                                                                          
 1339 jb        20   0  2856 1156  856 R  0.3  0.1   0:00.12 top                                                                                                                  
    1 root      20   0  3536 1840 1252 S  0.0  0.2   0:00.56 init             

jb@udev:~ % uname -a
Linux udev 3.2.0-31-generic-pae #50-Ubuntu SMP Fri Sep 7 16:39:45 UTC 2012 i686 i686 i386 GNU/Linux
jb@udev:~ % dmesg | tail 
...
[    6.087234] ZFS: Loaded module v0.6.0.80-rc11, ZFS pool version 28, ZFS filesystem version 5
[   19.566732] SPL: using hostid 0x10ac1820

behlendorf · 2012-10-09T16:52:39Z

@calmh Yours might be a 32-bit issue, use a 64-bit kernel.

calmh · 2012-10-09T20:50:04Z

Ah, sweet, of course.

Could we fail in a more obvious manner here, maybe not even compile? It
seems that could be preferable to just mysteriously not working.

On 9 okt 2012, at 18:52, Brian Behlendorf [email protected] wrote:

@calmh https://github.com/calmh Yours might be a 32-bit issue, use a
64-bit kernel.

—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/596#issuecomment-9269280.

behlendorf · 2012-10-10T17:15:01Z

@calmh That's probably not a bad idea. We could add a gigantic warning to the 32-bit builds encouraging folks not to use them. This is something we want to get fixed eventually but it requires some substantial work and is very low priority for us.

behlendorf · 2012-10-10T17:44:12Z

@calmh I was also going to say, if your system is still otherwise responsive when this happens I'd be interested in the contents of the /proc/spl/kstat/zfs/dmu_tx file. It may show which things are blocked.

calmh · 2012-10-13T15:04:44Z

jb@udev:~ % date ; cat /proc/spl/kstat/zfs/dmu_tx
Sat Oct 13 17:04:08 CEST 2012
3 1 0x01 12 576 91157505686 147178133264
name                            type data
dmu_tx_assigned                 4    4
dmu_tx_delay                    4    590390
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    0
dmu_tx_how                      4    0
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    590985
dmu_tx_memory_inflight          4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_write_limit              4    0
dmu_tx_quota                    4    0
jb@udev:~ % date ; cat /proc/spl/kstat/zfs/dmu_tx
Sat Oct 13 17:04:15 CEST 2012
3 1 0x01 12 576 91157505686 154354909747
name                            type data
dmu_tx_assigned                 4    4
dmu_tx_delay                    4    697412
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    0
dmu_tx_how                      4    0
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    698057
dmu_tx_memory_inflight          4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_write_limit              4    0
dmu_tx_quota                    4    0

behlendorf · 2012-10-13T21:24:10Z

@calmh This is the critical bit:

dmu_tx_memory_reclaim           4    590985

Two fixes actually just went in to master which should resolve this issue on 32-bit systems. If you could grab the latest source from git and build it you should see this counter drop to zero (or at least very rarely increment).

b68503f Remove vmem_size() consumers
7df05a4 Fix zfs_write_limit_max integer size mismatch on 32-bit systems

ryao · 2012-10-13T21:46:10Z

It might require tuning vmalloc and possibly a few module parameters in addition to using these patches before this works well. Here are some known values that have worked for people:

vmalloc=384M
zfs_arc_max=41943040
zfs_vdev_cache_size=5242880

imp · 2012-10-13T22:03:36Z

I am not sure about zfs_vdev_cache_size, especially with low memory configuration (i.e. below 8 GB). It is very easy to waste memory on vdev_cache. Use /proc/spl/kstat/zfs/vdev_cache_stats to analyze the benefit of having zfs_vdev_cache_size greater than zero. I have plenty of memory (many tens of GB) and still found memory spent on vdev cache to be a waste.

ryao · 2012-10-14T01:08:31Z

This is a good point. I had blindly copied this from FreeBSD's recommendations. While they worked for people, more attention needs to paid to what these settings are actually doing. I only posted them because they are known to work.

ryao · 2012-10-14T01:10:00Z

It might be useful to modify the output of /proc/spl/kstat/zfs/vdev_cache_stats to provide more useful information:

# cat /proc/spl/kstat/zfs/vdev_cache_stats
6 1 0x01 3 144 5373198005 3847694717716
name                            type data
delegations                     4    0
hits                            4    0
misses                          4    0

I honestly have no idea what I am reading in that. I will need to look at the code to figure it out.

behlendorf · 2012-10-14T01:32:16Z

Regarding zfs_vdev_cache_size we actually disabled this by default quite some time ago, 2cc6c8d . As @imp says it was shown to waste quite a bit of memory and it was hard to point to any real benefit. Both Illumos and Solaris have made this same change, so I'd just leave it disabled.

Also is the default 1/2 of the vmalloc region a reasonable default in practice or should we add some code to size it smaller. If possible I'd love for there to be sane defaults for both 32-bit and 64-bit systems.

rssalerno closed this as completed Mar 16, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

txg_sync at 100% CPU #596

txg_sync at 100% CPU #596

rssalerno commented Mar 12, 2012

behlendorf commented Mar 12, 2012

rssalerno commented Mar 16, 2012

pepevel commented Aug 20, 2012

mcr-ksh commented Sep 4, 2012

calmh commented Oct 9, 2012

behlendorf commented Oct 9, 2012

calmh commented Oct 9, 2012

behlendorf commented Oct 10, 2012

behlendorf commented Oct 10, 2012

calmh commented Oct 13, 2012

behlendorf commented Oct 13, 2012

ryao commented Oct 13, 2012

imp commented Oct 13, 2012

ryao commented Oct 14, 2012

ryao commented Oct 14, 2012

behlendorf commented Oct 14, 2012

txg_sync at 100% CPU #596

txg_sync at 100% CPU #596

Comments

rssalerno commented Mar 12, 2012

behlendorf commented Mar 12, 2012

rssalerno commented Mar 16, 2012

pepevel commented Aug 20, 2012

mcr-ksh commented Sep 4, 2012

calmh commented Oct 9, 2012

behlendorf commented Oct 9, 2012

calmh commented Oct 9, 2012

behlendorf commented Oct 10, 2012

behlendorf commented Oct 10, 2012

calmh commented Oct 13, 2012

behlendorf commented Oct 13, 2012

ryao commented Oct 13, 2012

imp commented Oct 13, 2012

ryao commented Oct 14, 2012

ryao commented Oct 14, 2012

behlendorf commented Oct 14, 2012