txg_sync at 100% reopen #938

mcr-ksh · 2012-09-04T21:00:22Z

Hi,

got the same issue as it was closed before ! i just posted a kernel panic yesterday on SPL.
txg_sync is at 100%
5268 root 0 -20 0 0 0 D 40.8 0.0 7:21.33 txg_sync

devices blocked and are irresponsive. All VMs gone.

I was on latest commit.

Regards,
Chris.

behlendorf · 2012-09-04T22:15:15Z

Can you get a back trace from the txg_sync thread so we can determine if this really is the same issue.

phaedrus77 · 2012-09-05T10:24:27Z

I think I'm seeing the same issue. Running kernel 3.4.4 (on OEL 6) and spl/zfs 0.6.0rc10.
Hammering memory makes zfs go bonkers. txg_sync consuming one core and all IO to the pool being blocked. I can reliably trigger this by having a memory hog like firefox running and than start a VM with virtualbox (guest with 4GB RAM assigned) on my notebook (8GB RAM). Very rarely txg_sync starts to behave after like 10-15min, most of the time it doesn't. When it's stuck I am unable to kill processes that use data on this pool (the pool being my home). Nothing zfs related in the logs. Occasionally I see a warning about memory being tight. Swap is hardly used (a few hundred MB), but free RAM goes down to ~50MB. I can still reboot from the cli.
Any hints how to collect more information?

Cheers,
Bernd

behlendorf · 2012-09-05T16:27:31Z

@phaedrus77 I'd actually be much more interested to hear if using the latest master source resolves this issue. There were substantial memory management improvements merged in to master which should help. The master source is stable and there's no risk to your data so it would be an easy thing to try.

mcr-ksh · 2012-09-05T21:37:18Z

i had it and posted it on spl. It was closed before. it was mentioned to be a copy of
#917

that was the oops.

El 05/09/2012, a las 18:27, Brian Behlendorf [email protected] escribió:

@phaedrus77 I'd actually be much more interested to hear if using the latest master source resolves this issue. There were substantial memory management improvements merged in to master which should help. The master source is stable and there's no risk to your data so it would be an easy thing to try.

—
Reply to this email directly or view it on GitHub.

phaedrus77 · 2012-09-06T11:12:24Z

@behlendorf I pulled the source from git (zfs+spl) as zipped archives:
zfsonlinux-spl-spl-0.6.0-rc10-19-gac8ca67.zip
zfsonlinux-zfs-zfs-0.6.0-rc10-36-gcafa970.zip

spl compiles fine, but zfs chokes on
CC [M] /export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/fm.o
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/fm.c: In function ‘fm_ena_generate’:
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/fm.c:1418: error: implicit declaration of function ‘kpreempt_disable’
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/fm.c:1420: error: implicit declaration of function ‘kpreempt_enable’

spl has kpreempt_* defined in include/sys/disp.h as
#define kpreempt_disable() preempt_disable()
#define kpreempt_enable() preempt_enable()

while zfs has
#define kpreempt_disable() ((void)0)
#define kpreempt_enable() ((void)0)
in include/sys/zfs_context.h

which one is the right one to use in this place? should this part only be compiled when _KERNEL is defined? where should I put the missing ifdefs? did I miss a step or is my build environment broken?
I did a
./autogen.sh
./configure
make

phaedrus77 · 2012-09-06T11:16:02Z

for testing further I commented the kpreempt lines out of fm.c now I get
CC [M] /export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.o
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c: In function ‘txg_hold_open’:
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c:230: error: implicit declaration of function ‘kpreempt_disable’
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c:232: error: implicit declaration of function ‘kpreempt_enable’
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c: In function ‘txg_sync_thread’:
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c:391: error: ‘PF_NOFS’ undeclared (first use in this function)
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c:391: error: (Each undeclared identifier is reported only once
/export/home/bernd/zfs/zfsonlinux-zfs-cafa970/module/zfs/../../module/zfs/txg.c:391: error: for each function it appears in.)

seems to hit the same problem. the error on line 391 is interesting though:
#ifdef _KERNEL
/*
* Annotate this process with a flag that indicates that it is
* unsafe to use KM_SLEEP during memory allocations due to the
* potential for a deadlock. KM_PUSHPAGE should be used instead.
/
current->flags |= PF_NOFS;
#endif / _KERNEL */
meams _KERNEL is defined, so why does it fail on the kpreempt calls?

dechamps · 2012-09-06T11:54:16Z

You're getting these errors because ZFS is being built with an old SPL version. Compile the latest SPL from master, then compile latest ZFS from master using --with-spl=/path/to/splsource.

phaedrus77 · 2012-09-06T20:17:50Z

@dechamps thanks this was it. i forgot to install the freshly built spl bits :-\

@behlendorf master looks good! I was able to boot a VM while having firefox running. txg_sync runs up pretty high - consuming about two cores on my maschine while the VM boots up an does loads of IO but returns to normal when the VMs IO is down to normal.
Thanks!

behlendorf · 2012-09-06T20:35:14Z

That's great news. I'm going to close out this issue since it sounds like the updated master code is working as expected and addresses the issue.

phaedrus77 · 2012-09-21T06:51:09Z

I'm afraid the problem wasn't 100% solved. After running a few days (with several suspends/resumes) txg_sync came back blocking io to the pool. I can trigger the lockup by heavily stressing memory. I tried to observe as much as possible and noticed that kswapd was doing its thing and arc size went down by a huge amount (~1.5GB, from about 1.8GB down to well below 512MB, where arc_size stabilized) that seems to somehow get txg_sync stuck.
for now I limited arc_max to avoid running into big size reductions when memory gets tight. seems to prevent the lockups. for now I can live with possible performance penalties.
how would one collect more information on this issue?

behlendorf · 2012-09-21T21:25:27Z

OK, I've reopened the issue. It sounds like things are better but not 100% fixed. If your able to reproduce this issue please grab the contents of /proc/spl/kstat/zfs/dmu_tx. This file should give us some clue about what's preventing the txg sync from completing.

phaedrus77 · 2012-09-23T19:04:49Z

I did a
while /bin/true
do
cat /proc/spl/kstat/zfs/dmu_tx>>dmu
cat /proc/spl/kstat/zfs/arcstats|grep ^size
sleep 5
done

while starting a VM in vbox. txg_sync started consuming lots of cpu time when vbox allocated memory for the vm. all processes accessing data on the same dataset hung when txg_sync went up.
the collected data is on http://149.203.91.65/~markgraf/dmu (32kB).
thanks for your help!

phaedrus77 · 2012-10-02T11:20:26Z

another crash today with an interesting trace in the logs:
[160682.981107] firefox: page allocation failure: order:4, mode:0x4030
[160682.981122] Pid: 3812, comm: firefox Tainted: P O 3.4.4 #3
[160682.981128] Call Trace:
[160682.981149] [ <ffffffff810dd87b>] warn_alloc_failed+0xeb/0x130
[160682.981161] [ <ffffffff810dfb40>] ? page_alloc_cpu_notify+0x50/0x50
[160682.981172] [ <ffffffff810dfb51>] ? drain_local_pages+0x11/0x20
[160682.981185] [ <ffffffff810df274>] __alloc_pages_nodemask+0x534/0x840
[160682.981199] [ <ffffffff811153f1>] alloc_pages_current+0xb1/0x110
[160682.981210] [ <ffffffff810dd739>] __get_free_pages+0x9/0x40
[160682.981221] [ <ffffffff8111876a>] kmalloc_order_trace+0x3a/0xb0
[160682.981234] [ <ffffffff8111b725>] __kmalloc+0x125/0x170
[160682.981254] [ <ffffffffa0095b58>] spl_kmem_cache_alloc+0x748/0xf30 [spl]
[160682.981270] [< ffffffff81057a40>] ? wake_up_bit+0x40/0x40
[160682.981321] [ <ffffffffa01c4e7d>] zio_buf_alloc+0x1d/0x20 [zfs]
[160682.981363] [ <ffffffffa01c579f>] zio_read_bp_init+0xff/0x130 [zfs]
[160682.981404] [ <ffffffffa01c6838>] zio_nowait+0xa8/0x120 [zfs]
[160682.981430] [ <ffffffffa0132da0>] arc_read_nolock+0x4a0/0x780 [zfs]
[160682.981456] [ <ffffffffa013335d>] arc_read+0x7d/0x160 [zfs]
[160682.981490] [ <ffffffffa014cf94>] ? dnode_block_freed+0xd4/0x160 [zfs]
[160682.981525] [ <ffffffffa0136040>] ? dbuf_fill_done+0x80/0x80 [zfs]
[160682.981537] [ <ffffffffa01656cc>] dsl_read+0x2c/0x30 [zfs]
[160682.981557] [ <ffffffffa0136bf9>] dbuf_read+0x1f9/0x700 [zfs]
[160682.981599] [ <ffffffffa013e3da>] dmu_buf_hold_array_by_dnode+0x16a/0x540 [zfs]
[160682.981654] [ <ffffffffa013ea30>] dmu_buf_hold_array+0x60/0x90 [zfs]
[160682.981657] [ <ffffffffa010b963>] ? avl_add+0x33/0x50 [zavl]
[160682.981666] [ <ffffffffa013ea9c>] dmu_read_uio+0x3c/0xc0 [zfs]
[160682.981670] [ <ffffffff816d5919>] ? mutex_lock+0x19/0x40
[160682.981677] [ <ffffffffa01b9270>] zfs_read+0x170/0x4a0 [zfs]
[160682.981684] [ <ffffffffa01cbe8d>] zpl_read_common+0x4d/0x70 [zfs]
[160682.981691] [ <ffffffffa01cbf14>] zpl_read+0x64/0xa0 [zfs]
[160682.981694] [ <ffffffff81121fc5>] vfs_read+0xc5/0x190
[160682.981696] [ <ffffffff8112218c>] sys_read+0x4c/0x90
[160682.981699] [ <ffffffff816df1a2>] system_call_fastpath+0x16/0x1b

I hope this helps to narrow it down.

kleini · 2012-10-12T10:28:50Z

Some more information to narrow this down. It tested zfsonlinux on several of my systems. I encouter this issue not on all of those machines.

Debian Squeeze, standard Debian kernel 2.6.32-5-amd64. ZFS 0.6.0-rc9 and 0.6.0-rc11 compiled as described on your web page. ZFS on a single partition. ZFS works here without any problems. I don't see any txg_sync process with high CPU load.

Debian Squeeze. Kernel from Debian Wheezy 3.2.0-3-amd64. ZFS 0.6.0-rc9 and 0.6.0-rc11. ZFS on 4 x 2 mirror SSDs, or using raidz3 or just on a single partition. I always see txg_sync process with 100% CPU load after writing 5-10GB to that filesystem.

Gentoo Linux. Kernel from vanilla-sources 3.5.0 built with genkernel, default configuration. sys-kernel/spl, sys-fs/zfs-kmod, sys-fs/zfs 0.6.0-rc10. USE flags: kernel_linux, rootfs. Two ZFS pools on single partitions, several subvolumes. No txg_sync process with high CPU load.

I will provide output of /proc/spl/kstat/zfs/dmu_tx the week after next because I am on vacation.

ryao · 2012-10-12T10:31:41Z

It sounds like Debian is applying a patch to Linux 3.2 that causes problems with ZFS. It would be helpful if someone could find out where Debian keeps those patches so that they could be examined.

kleini · 2012-10-12T10:38:03Z

Sources are available here: http://packages.debian.org/testing/kernel/linux-image-3.2.0-3-amd64
On the right is a link to the orig.tar.xz and debian.tar.xz. In fact it is kernel 3.2.23. I am not sure if the orig.tar.xz contains the vanilla kernel sources.

phaedrus77 · 2012-10-12T11:19:53Z

in my case i pulled the vanilla sources and built the kernel myself...

ryao · 2012-10-12T14:01:26Z

The patches that debian applies are included as individual files in the following tarball:

http://ftp.us.debian.org/debian/pool/main/l/linux/linux_3.2.23-1.debian.tar.xz

gregjurman · 2012-10-18T15:16:51Z

@kleini I am running 3.2.0-31-generic #50-Ubuntu SMP x86_64 on Ubuntu 12.04 and also seeing this issue.

I was running iozone3 to test speed of a set of new hard-disks that would generate files around 8GB each. This had been going for about 36-hours when I did a zpool status to check one of the other pools on another disk. This froze and I was not able to kill the process. At that time call-traces began to show in syslog. iostat gave indications that the drive was still active (as did the HDD indicator). iozone3 was reporting proper statistics as well.

Traces from syslog: http://fpaste.org/xtL8/

turrini · 2012-11-04T20:54:14Z

I can confirm this under Debian Wheezy 3.2.0-4 (3.2.32-1), txg_sync keeps at 100% when starting a VirtualBox guest vm, for example. I'm using latest ZFS/SPL 0.6.0-rc11 on a fresh install.

There are no (strange/dump/traces) messages whatsoever on dmesg/syslog, it just hangs at 100% but the rest of the system keeps responsible (but the VM doesn't start after all).

My arc max size is set to 512M, the machine has only 4GB of RAM and the VM is set to use 1GB, and it is a mirrored setup (sata/sata). Doesn't matter if I increase or decrease arc size, I get the same result.

phaedrus77 · 2012-11-06T13:04:34Z

decreasing arc (8GB ram, arc set to 512MB) has helped quite a bit for me.
the system is responsive with the exception of processes that have io pending in the same queue / transaction group. for me it usually freezes most of my desktop session (mail, browser and co are stuck - my guess due to interaction with gconfd). i cannot kill those processes. usually this means a reboot is the fastest and least painful way...

ryao · 2012-11-06T14:36:11Z

@dajhorn, is there any chance that you could take a look at this?

dajhorn · 2012-11-06T15:17:25Z

@ryao, maybe on Friday.

Whenever I see Virtual Box in a bug report, I ask the user to install the latest official build from Oracle. It tends to make problems like this go away.

kleini · 2012-11-06T15:43:04Z

zpool create -m /test mirror /dev/sda /dev/sdb
dd if=/dev/zero of=/test/test.img bs=1M count=10240

The written file gets stuck at 922M. Kernel: Debian 3.2.23-1 from Debian Wheezy. Happens with 0.6.0-rc9 and 0.6.0-rc11.

Output of /proc/spl/kstat/zfs/dmu_tx:

cat /proc/spl/kstat/zfs/dmu_tx
3 1 0x01 12 576 14488562252 2257935726832469
name type data
dmu_tx_assigned 4 7395
dmu_tx_delay 4 928413
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 0
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 930201
dmu_tx_memory_inflight 4 1
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 7
dmu_tx_quota 4 0

cat /proc/spl/kstat/zfs/dmu_tx
3 1 0x01 12 576 14488562252 2257950870188395
name type data
dmu_tx_assigned 4 7395
dmu_tx_delay 4 1024537
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 0
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 1026415
dmu_tx_memory_inflight 4 1
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 7
dmu_tx_quota 4 0

cat /proc/spl/kstat/zfs/dmu_tx
3 1 0x01 12 576 14488562252 2257978382762480
name type data
dmu_tx_assigned 4 7395
dmu_tx_delay 4 1228438
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 0
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 1230411
dmu_tx_memory_inflight 4 1
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 7
dmu_tx_quota 4 0

phaedrus77 · 2012-11-06T16:00:18Z

@dajhorn it's been persistant for at least the last 3 VBox builds. I'd say I could trigger this by just eating huge chunk of memory.

behlendorf · 2012-11-06T19:53:21Z

@phaedrus77 The changes in dmu_tx_delay and dmu_tx_memory_reclaim indicate that the txg are being stalled due to what it thinks is memory pressure on the system. Something similar was hit on 32-bit systems and fixed post -rc11. If this is a 32-bit system it would be best to move to a 64-bit system OR grab the latest master OR cherry pick the following commits:

b68503f
7df05a4

kleini · 2012-11-06T21:12:45Z

It is a 64-bit system. You are talking about some memory pressure on the system. The system has 16G memory, an 8-core hyperthreading CPU and only the base system was installed and running. So less than 1G of the memory was in use. 15G of free memory should not be a situation of memory pressure.

phaedrus77 · 2012-11-06T21:29:24Z

@behlendorf it's a 64bit system already 8-) I'll grab the latest master tomorrow or so and try that...
@kleini I think the memory pressure comment was aimed at me :-) I run VMs with half the physical memory allocated to them... with all caching starting the vm(s) is heavy on the memory system...

kleini · 2012-12-03T14:47:47Z

On my installation are no virtual machines. The whole memory was available for the base linux system and ZFS. I do not have a swap zvol. I just noticed this is allowed now - my knowledge to date remembers swap on zvol is discouraged.

phaedrus77 · 2012-12-03T15:37:50Z

I have max 4GB allocated a VM + 128MB "Video RAM" Thats the only VM running when it hits the bug. I have some smaller ones running in parallel as well every now and then (3*1GB+2GB) which also triggers the bug. It can happen pretty much anytime when large chunks of RAM are allocated and some process tries to allocate some more or a new process starts.
Swap is a separate partition. I didn't dare to use a zvol for swap on linux ;-)

wphilips · 2012-12-03T17:08:37Z

My configuration does NOT have swap on a zvol.
It typically runs one VirtualBox machine with a base memory of 512 Mbyte.
The machine is basically used by one user. Some services are running (sendmail, mysqld,
openvpn, httpd) but with a very light load.

The only thing special about my setup is that one of the zvols (telinbackup2)
runs on top of iscsi. The zfs filesystem on that one is exclusively used to store backups created
using rsnapshot (i.e., rsync):
telinbackup2 172G 328G 29K none
telinbackup2/backup2 172G 328G 172G /mnt/mirror/backup2

The second zpool (zfs) is a mirror created out of two lvm partitions on different disks.
It contains 4 filesystems. One of those is completely unused (mysql).
Another one is rarely used and not backed up (archive).
The third one (backup) stores backups created using rsnapshot (rsync).
The last one (devel) contains user data that is updated very infrequently but is
backed up using rsnapshot (rsync) regularly:
zfs/archive 45.0G 170G 45.0G /mnt/archive
zfs/backup 134G 170G 134G /mnt/zfs/backup
zfs/devel 42.7G 170G 33.3G /mnt/devel
zfs/mysql 48K 170G 30K none

The backups (to backup2 and to backup) are run sequentially in a round robin fashion.
A backup is run every three hours. Usually the amount of data that has changed is very
small, but of course a lot of data is read.

On devel, snapshots are created automatically every hour and older snapshots are cleaned out.
Currently there are 54 snapshots.

All zfs filesystems are compressed using gzip.

I hope this helps.

behlendorf · 2012-12-12T18:19:46Z

@phaedrus77 When I say memory pressure in this case it's entirely possible that there is no memory pressure and ZFS is just getting the calculation wrong. It tries (for better or worse) to be proactive about throttling things when it observed what it believes to be a low memory situation. Because of how Virtual Box works it's been known to throw off those calculation which may can ZFS to over aggressively throttle the system. In this case it's attempting to ensure there's enough free memory (or reclaimable memory) on the system to construct a full txg. See arc_memory_throttle for the logic here.

You may be able to improve this by decreasing the maximum txg size which is set by default to 1/8 of system memory. Try setting the zfs_write_limit_shift module option to to 4 (1/16), 5 (1/32), or 6 (1/64) of total memory and see if it helps. All of these values may need additional tuning on Linux particularly when things like Virtual Box are involved.

behlendorf · 2013-01-28T17:05:40Z

@phaedrus77 where do we currently stand on this? Presumably your still having issues, did changing zfs_write_limit_shift help?

mcr-ksh · 2013-01-28T19:13:17Z

Ehm. I still had issues and Big trouble getting my System back to productive. I used the parameter and for me it works. I only have kmem_cache now a bit busy (5-10%). But the overall overall performance is awefull.

phaedrus77 · 2013-01-28T21:08:23Z

@behlendorf I'm still trying different values for zfs_write_limit_shift. Currently I have set this to 6. I still see txg_sync go up eating cpu like mad and block IO but until now it came back after 10-20seconds. I'm trying to really hammer my machine tonight. as @mcr_ksh said it comes with a performance hit.

behlendorf · 2013-01-28T21:13:10Z

Thanks. Could you both include the following patch in your testing. It should help with the large cpu usage by spl_kmem_cache by effectively disabling the aging which isn't critical or potentially even useful.

openzfs/spl#213

phaedrus77 · 2013-01-29T11:11:17Z

I had a bad lockup this morning again (VBox and Firefox figthing over the memory). I had zfs_write_limit_shift=6 and arc_max set to 512MB.
I just pulled your patch and rc13 and built the modules. Gonna test over the next days.
Thanks for your help!

phaedrus77 · 2013-01-29T12:15:45Z

Just had the next lockup. This was rc13 + your patch and options set as before. Of my 8GB RAM about 60MB were free which got txg_sync stuck. I'll see if further limiting zfs_write_limit_shift helps.

behlendorf · 2013-01-29T16:44:58Z

@phaedrus77 When things get stuck is it always because of the 'dmu_tx_memory_reclaim' count in /proc/spl/kstat/zfs/dmu_tx. If so we should consider disabling that throttle, it's not critical. It's another bit of code we inherited from Solaris and it's completely unclear if it's really needed for Linux.

phaedrus77 · 2013-01-29T20:41:24Z

I think when I watched /proc/spl/kstat/zfs/dmu_tx I saw dmu_tx_memory_reclaim growing. I'll have an eye on it when I next trash the box ;-)

turrini · 2013-01-31T00:10:40Z

@phaedrus77

I've found "peace" on VirtualBox by using these module options:

512mb ARC (small servers with 4gb RAM and one 3gb RAM VM)
options zfs zfs_recover=1 zfs_arc_max=536870912 zfs_arc_meta_limit=536870912 zfs_dedup_prefetch=0 zfs_no_write_throttle=1 zfs_txg_synctime_ms=30000 zfs_txg_timeout=30
4096mb ARC (almost "big" servers with 32gb RAM and six 4gb RAM VMs, or three 8gb RAM VMs)
options zfs zfs_recover=1 zfs_arc_max=4294967296 zfs_arc_meta_limit=4294967296 zfs_dedup_prefetch=0 zfs_no_write_throttle=1 zfs_txg_synctime_ms=30000 zfs_txg_timeout=30

(Debian 3.2.0-4-amd64)
And so on...

kleini · 2013-01-31T15:33:42Z

For me the problem seems to be fixed in one of the RC 9-13. For me the additional module options are not necessary.

phaedrus77 · 2013-01-31T16:19:53Z

I still see the problem with rc13. I'll give the options a try.

phaedrus77 · 2013-01-31T16:39:02Z

with @turrini optins rc13 on kernel 3.4.4 hans again.
dmu_tx every few seconds yields
3 1 0x01 12 576 14031531174 737037340040
name type data
dmu_tx_assigned 4 18176
dmu_tx_delay 4 2572087
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 479
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 3310982
dmu_tx_memory_inflight 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 0
dmu_tx_quota 4 0
3 1 0x01 12 576 14031531174 739617351901
name type data
dmu_tx_assigned 4 18176
dmu_tx_delay 4 2601658
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 504
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 3352156
dmu_tx_memory_inflight 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 0
dmu_tx_quota 4 0
3 1 0x01 12 576 14031531174 741302655783
name type data
dmu_tx_assigned 4 18176
dmu_tx_delay 4 2620636
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 517
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 3378853
dmu_tx_memory_inflight 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 0
dmu_tx_quota 4 0
3 1 0x01 12 576 14031531174 742346289356
name type data
dmu_tx_assigned 4 18176
dmu_tx_delay 4 2632389
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 519
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 3395381
dmu_tx_memory_inflight 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 0
dmu_tx_quota 4 0
3 1 0x01 12 576 14031531174 744181869558
name type data
dmu_tx_assigned 4 18176
dmu_tx_delay 4 2653718
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 541
dmu_tx_how 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 3425585
dmu_tx_memory_inflight 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_write_limit 4 0
dmu_tx_quota 4 0

behlendorf · 2013-01-31T17:14:06Z

@phaedrus77 It's clearly stalling because it believes there to be memory pressure (there in fact may not be). Can you try making this simple change which disables this throttle.

diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index 50b6865..a78c915 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -3636,11 +3636,13 @@ arc_memory_throttle(uint64_t reserve, uint64_t inflight_
        /* Easily reclaimable memory (free + inactive + arc-evictable) */
        available_memory = ptob(spl_kmem_availrmem()) + arc_evictable_memory();

+#if 0
        if (available_memory <= zfs_write_limit_max) {
                ARCSTAT_INCR(arcstat_memory_throttle_count, 1);
                DMU_TX_STAT_BUMP(dmu_tx_memory_reclaim);
                return (EAGAIN);
        }
+#endif

        if (inflight_data > available_memory / 4) {
                ARCSTAT_INCR(arcstat_memory_throttle_count, 1);

phaedrus77 · 2013-02-01T10:53:55Z

@behlendorf Thanks! That's the magic patch. I have firefox and vbox running without txg_sync blocking anything. top reports free memory changing somewhere between 60MB and 150MB while both are running. Before it remaind constant. dmu_tx_memory_reclaim stays at 0. also now I see pages being moved to swap, which never happend when txg_sync blocked.
thanks again!

behlendorf · 2013-02-01T18:19:14Z

@phaedrus77 That's good news. This sort of thing has come up a few times now always related to virtual box or in the past 32-bit systems. I'll push a patch in to master today to allow you to disable this code just with a module option so you won't need to carry a custom patch set zfs_arc_memory_throttle_disable=1.

More broadly disabling this code is probably the right thing for all Linux systems since we have a direct memory reclaim patch where Solaris does not. But before I make that the default I'd like to get some additional real world testing with this code disabled.

The zfs_arc_memory_throttle_disable module option was introduced by commit 0c5493d to resolve a memory miscalculation which could result in the txg_sync thread spinning. When this was first introduced the default behavior was left unchanged until enough real world usage confirmed there were no unexpected issues. We've now reached that point. Linux's direct reclaim is working as expected so we're enabling this behavior by default. This helps pave the way to retire the spl_kmem_availrmem() functionality in the SPL layer. This was the only caller. Signed-off-by: Brian Behlendorf <[email protected]> Issue #938

The zfs_arc_memory_throttle_disable module option was introduced by commit 0c5493d to resolve a memory miscalculation which could result in the txg_sync thread spinning. When this was first introduced the default behavior was left unchanged until enough real world usage confirmed there were no unexpected issues. We've now reached that point. Linux's direct reclaim is working as expected so we're enabling this behavior by default. This helps pave the way to retire the spl_kmem_availrmem() functionality in the SPL layer. This was the only caller. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#938

The way in which virtual box ab(uses) memory can throw off the free memory calculation in arc_memory_throttle(). The result is the txg_sync thread will effectively spin waiting for memory to be released even though there's lots of memory on the system. To handle this case I'm adding a zfs_arc_memory_throttle_disable module option largely for virtual box users. Setting this option disables free memory checks which allows the txg_sync thread to make progress. By default this option is disabled to preserve the current behavior. However, because Linux supports direct memory reclaim it's doubtful throttling due to perceived memory pressure is ever a good idea. We should enable this option by default once we've done enough real world testing to convince ourselve there aren't any unexpected side effects. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#938

The zfs_arc_memory_throttle_disable module option was introduced by commit 0c5493d to resolve a memory miscalculation which could result in the txg_sync thread spinning. When this was first introduced the default behavior was left unchanged until enough real world usage confirmed there were no unexpected issues. We've now reached that point. Linux's direct reclaim is working as expected so we're enabling this behavior by default. This helps pave the way to retire the spl_kmem_availrmem() functionality in the SPL layer. This was the only caller. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#938

blueacid · 2014-02-28T01:35:30Z

@turrini - Thank you for your "peaceful" VirtualBox options! I am running ZFS on Linux on Debian within ESXi, with RDM mapped drives (yeah, it's pretty rough as a configuration goes!)

However, your settings really helped restore performance - writes up from 800KB/sec to around 40-50MB/sec - understandable given that the array in question is on an external port multiplier!

To save anyone else from searching too far - here's how to apply those options to ZFS on boot:

Create a file within /etc/modprobe.d/ named whatever you want. I made an inspired choice with mine and named the file zfs
Copy in the line of text word-for-word:
options zfs zfs_recover=1 zfs_arc_max=536870912 zfs_arc_meta_limit=536870912 zfs_dedup_prefetch=0 zfs_no_write_throttle=1 zfs_txg_synctime_ms=30000 zfs_txg_timeout=30
Save, close, reboot. Enjoy!

You can check what each parameter is set to by running cat /sys/module/zfs/parameters/

Hope this helps someone else!

turrini · 2014-02-28T13:56:34Z

@blueacid In case if you're using ZFS on root, remember that actually on master there aren't anymore some of these module options, so it will result in an unbootable system if you don't remove them.

root@gmlinux:~# modinfo zfs | grep -E 'zfs_recover|zfs_no_write_throttle|zfs_txg_synctime'
parm: zfs_recover:Set to attempt to recover from fatal errors (int)

More info:
e8b96c6 Illumos #4045 write throttle & i/o scheduler performance work

blueacid · 2014-02-28T14:02:13Z

Hi There,

Thanks for the warning - in my case I'm not using ZFS as my root file
system; only for two storage arrays that don't hold any part of the system.
Good advice for those that might be doing so though - thanks!
On 28 Feb 2014 13:56, "Arthur Turrini" [email protected] wrote:

@blueacid https://github.com/blueacid In case if you're using ZFS on
root, remember that actually on master there aren't anymore some of these
module options, or it will result in an unbootable system.

root@gmlinux:~# modinfo zfs | grep -E
'zfs_recover|zfs_no_write_throttle|zfs_txg_synctime'
parm: zfs_recover:Set to attempt to recover from fatal errors (int)

More info:
e8b96c6 e8b96c6 Illumos #4045
write throttle & i/o scheduler performance work

Reply to this email directly or view it on GitHubhttps://github.com//issues/938#issuecomment-36352223
.

Bumps [gimli](https://github.com/gimli-rs/gimli) from 0.27.2 to 0.27.3. - [Changelog](https://github.com/gimli-rs/gimli/blob/master/CHANGELOG.md) - [Commits](gimli-rs/gimli@0.27.2...0.27.3) --- updated-dependencies: - dependency-name: gimli dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

behlendorf closed this as completed Sep 6, 2012

behlendorf reopened this Sep 21, 2012

behlendorf closed this as completed in 0c5493d Feb 1, 2013

FransUrbo mentioned this issue Nov 27, 2013

Deadlock in zpool, txg_sync and udev/vol_id when importing the pool #1862

Closed

kernelOfTruth mentioned this issue Mar 28, 2015

Stuck txg_sync process, and IO speed issues when ARC is full #3235

Closed

txg_sync at 100% reopen #938

txg_sync at 100% reopen #938

Comments

mcr-ksh commented Sep 4, 2012

behlendorf commented Sep 4, 2012

phaedrus77 commented Sep 5, 2012

behlendorf commented Sep 5, 2012

mcr-ksh commented Sep 5, 2012

phaedrus77 commented Sep 6, 2012

phaedrus77 commented Sep 6, 2012

dechamps commented Sep 6, 2012

phaedrus77 commented Sep 6, 2012

behlendorf commented Sep 6, 2012

phaedrus77 commented Sep 21, 2012

behlendorf commented Sep 21, 2012

phaedrus77 commented Sep 23, 2012

phaedrus77 commented Oct 2, 2012

kleini commented Oct 12, 2012

ryao commented Oct 12, 2012

kleini commented Oct 12, 2012

phaedrus77 commented Oct 12, 2012

ryao commented Oct 12, 2012

gregjurman commented Oct 18, 2012

turrini commented Nov 4, 2012

phaedrus77 commented Nov 6, 2012

ryao commented Nov 6, 2012

dajhorn commented Nov 6, 2012

kleini commented Nov 6, 2012

phaedrus77 commented Nov 6, 2012

behlendorf commented Nov 6, 2012

kleini commented Nov 6, 2012

phaedrus77 commented Nov 6, 2012

kleini commented Dec 3, 2012

phaedrus77 commented Dec 3, 2012

wphilips commented Dec 3, 2012

behlendorf commented Dec 12, 2012

behlendorf commented Jan 28, 2013

mcr-ksh commented Jan 28, 2013 • edited Loading

phaedrus77 commented Jan 28, 2013

behlendorf commented Jan 28, 2013

phaedrus77 commented Jan 29, 2013

phaedrus77 commented Jan 29, 2013

behlendorf commented Jan 29, 2013

phaedrus77 commented Jan 29, 2013

turrini commented Jan 31, 2013

kleini commented Jan 31, 2013

phaedrus77 commented Jan 31, 2013

phaedrus77 commented Jan 31, 2013

behlendorf commented Jan 31, 2013

phaedrus77 commented Feb 1, 2013

behlendorf commented Feb 1, 2013

blueacid commented Feb 28, 2014

turrini commented Feb 28, 2014

blueacid commented Feb 28, 2014

mcr-ksh commented Jan 28, 2013 •

edited

Loading