Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang during rsync #261

Closed
mgiammarco opened this issue Jun 2, 2011 · 38 comments
Closed

Hang during rsync #261

mgiammarco opened this issue Jun 2, 2011 · 38 comments
Milestone

Comments

@mgiammarco
Copy link

Hello,
during a big rsync between two zfs filesystems my sistem hangs. I have tried to put primarycache=metadata, it only hangs later.

Here is a debug log:

Jun 2 14:41:27 backup1 kernel: [ 26.798015] ZFS: Loaded module v0.6.0.16, ZFS pool version 28, ZFS filesystem version 5
Jun 2 14:50:28 backup1 kernel: [ 594.852215] hrtimer: interrupt took 79628485 ns
Jun 2 14:52:33 backup1 kernel: [ 719.889678] INFO: task txg_sync:1709 blocked for more than 120 seconds.
Jun 2 14:52:33 backup1 kernel: [ 719.889795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 2 14:52:33 backup1 kernel: [ 719.889896] txg_sync D 0000000000000000 0 1709 2 0x00000000
Jun 2 14:52:33 backup1 kernel: [ 719.889901] ffff88012788bbd0 0000000000000046 ffff88012788bfd8 ffff88012788a000
Jun 2 14:52:33 backup1 kernel: [ 719.889904] 0000000000013d00 ffff8801368c83b8 ffff88012788bfd8 0000000000013d00
Jun 2 14:52:33 backup1 kernel: [ 719.889908] ffff880135890000 ffff8801368c8000 ffff88012788bbd0 ffffc90015602c68
Jun 2 14:52:33 backup1 kernel: [ 719.889911] Call Trace:
Jun 2 14:52:33 backup1 kernel: [ 719.889955] [] cv_wait_common+0x77/0xd0 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.889973] [] ? autoremove_wake_function+0x0/0x40
Jun 2 14:52:33 backup1 kernel: [ 719.889980] [] __cv_wait+0x13/0x20 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.890090] [] zio_wait+0xc3/0x170 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890116] [] dsl_pool_sync+0xd9/0x4a0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890136] [] spa_sync+0x3ab/0x9b0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890139] [] ? autoremove_wake_function+0x16/0x40
Jun 2 14:52:33 backup1 kernel: [ 719.890144] [] ? __wake_up+0x53/0x70
Jun 2 14:52:33 backup1 kernel: [ 719.890167] [] txg_sync_thread+0x241/0x3c0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890189] [] ? txg_sync_thread+0x0/0x3c0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890195] [] thread_generic_wrapper+0x78/0x90 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.890200] [] ? thread_generic_wrapper+0x0/0x90 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.890203] [] kthread+0x96/0xa0
Jun 2 14:52:33 backup1 kernel: [ 719.890210] [] kernel_thread_helper+0x4/0x10
Jun 2 14:52:33 backup1 kernel: [ 719.890213] [] ? kthread+0x0/0xa0
Jun 2 14:52:33 backup1 kernel: [ 719.890216] [] ? kernel_thread_helper+0x0/0x10
Jun 2 14:54:33 backup1 kernel: [ 839.634103] INFO: task txg_sync:1709 blocked for more than 120 seconds.
Jun 2 14:54:33 backup1 kernel: [ 839.634187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 2 14:54:33 backup1 kernel: [ 839.634286] txg_sync D 0000000000000000 0 1709 2 0x00000000
Jun 2 14:54:33 backup1 kernel: [ 839.634291] ffff88012788bbd0 0000000000000046 ffff88012788bfd8 ffff88012788a000
Jun 2 14:54:33 backup1 kernel: [ 839.634294] 0000000000013d00 ffff8801368c83b8 ffff88012788bfd8 0000000000013d00
Jun 2 14:54:33 backup1 kernel: [ 839.634298] ffff880135890000 ffff8801368c8000 ffff88012788bbd0 ffffc90015602c68
Jun 2 14:54:33 backup1 kernel: [ 839.634301] Call Trace:
Jun 2 14:54:33 backup1 kernel: [ 839.634317] [] cv_wait_common+0x77/0xd0 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634323] [] ? autoremove_wake_function+0x0/0x40
Jun 2 14:54:33 backup1 kernel: [ 839.634328] [] __cv_wait+0x13/0x20 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634365] [] zio_wait+0xc3/0x170 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634384] [] dsl_pool_sync+0xd9/0x4a0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634404] [] spa_sync+0x3ab/0x9b0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634406] [] ? autoremove_wake_function+0x16/0x40
Jun 2 14:54:33 backup1 kernel: [ 839.634410] [] ? __wake_up+0x53/0x70
Jun 2 14:54:33 backup1 kernel: [ 839.634431] [] txg_sync_thread+0x241/0x3c0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634451] [] ? txg_sync_thread+0x0/0x3c0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634456] [] thread_generic_wrapper+0x78/0x90 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634461] [] ? thread_generic_wrapper+0x0/0x90 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634463] [] kthread+0x96/0xa0
Jun 2 14:54:33 backup1 kernel: [ 839.634467] [] kernel_thread_helper+0x4/0x10
Jun 2 14:54:33 backup1 kernel: [ 839.634469] [] ? kthread+0x0/0xa0
Jun 2 14:54:33 backup1 kernel: [ 839.634471] [] ? kernel_thread_helper+0x0/0x10
Jun 2 14:56:33 backup1 kernel: [ 959.380537] INFO: task rsync:3180 blocked for more than 120 seconds.
Jun 2 14:56:33 backup1 kernel: [ 959.380637] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 2 14:56:33 backup1 kernel: [ 959.380765] rsync D 0000000000000000 0 3180 3179 0x00000000
Jun 2 14:56:33 backup1 kernel: [ 959.380771] ffff88012c685b88 0000000000000082 ffff88012c685fd8 ffff88012c684000
Jun 2 14:56:33 backup1 kernel: [ 959.380776] 0000000000013d00 ffff8801263cc858 ffff88012c685fd8 0000000000013d00
Jun 2 14:56:33 backup1 kernel: [ 959.380780] ffff880125d8c4a0 ffff8801263cc4a0 ffff88012c685ba8 ffff88012cf4c330
Jun 2 14:56:33 backup1 kernel: [ 959.380784] Call Trace:
Jun 2 14:56:33 backup1 kernel: [ 959.380804] [] cv_wait_common+0x77/0xd0 [spl]
Jun 2 14:56:33 backup1 kernel: [ 959.380811] [] ? autoremove_wake_function+0x0/0x40
Jun 2 14:56:33 backup1 kernel: [ 959.380818] [] __cv_wait+0x13/0x20 [spl]
Jun 2 14:56:33 backup1 kernel: [ 959.380866] [] txg_wait_open+0x73/0xb0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380886] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380914] [] zfs_write+0x363/0xbd0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380929] [] ? do_sync_read+0xd2/0x110
Jun 2 14:56:33 backup1 kernel: [ 959.380956] [] zpl_write_common+0x52/0x80 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380981] [] zpl_write+0x68/0xa0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380984] [] vfs_write+0xc6/0x180
Jun 2 14:56:33 backup1 kernel: [ 959.380987] [] sys_write+0x51/0x90
Jun 2 14:56:33 backup1 kernel: [ 959.380991] [] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

It looks like both processes are waiting on an open the open transaction, but it's not clear why. With what version are you able to reproduce this? Fixes for several deadlocks have been commited post -rc4. You can give then a try if you pull the latest master.

@mgiammarco
Copy link
Author

Version is on first line of log, it is the latest in ppa.

Il giorno 02/giu/2011 18:13, "behlendorf" <
[email protected]> ha
scritto:

It looks like both processes are waiting on an open the open transaction,
but it's not clear why. With what version are you able to reproduce this?
Fixes for several deadlocks have been commited post -rc4. You can give
then a try if you pull the latest master.

Reply to this email directly or view it on GitHub:
https://github.com/behlendorf/zfs/issues/261#comment_1281571

@mgiammarco
Copy link
Author

I have disabled l2arc and zil log. I have redone the rsync test. Unfortunately it fails again. But this time it fails due to outofmemory killer.

@behlendorf
Copy link
Contributor

How much memory is in your system? At the moment things get a little dicey with ZFS if you have less than about 2GB.

@mgiammarco
Copy link
Author

The server was on esxi so I gave it 5gb this time. I repeat the settings:

  • no l2arc and no zil (else I got the error at the start of the thread)
  • 5gb ram
  • primarycache=metadata
  • zfs.conf in modprobe.d = options zfs zfs_arc_max=107374182 zfs_arc_meta_limit=107374182 zfs_arc_max=107374182

Result: zfs eats all memory again this time after more time. It is strange that for first hours memory consumption seems normal, then it goes crazy and eats all memory.

I use latest version from ppa.

@ahamilto
Copy link

I get a similar issue on a testing backup server when using rsync. Interestingly, the problem does not occur when the heavy I/O is generated by a non-rsync process (specifically, the OpenAFS backup server). The server in question has 7GB of RAM for a RAIDz pool of 4x 750GB drives. Compression is enabled on the pool but deduplication is disabled. Kernel is v2.6.36 checked out from kernel.org.

Please let me know if you need any more information.

@behlendorf
Copy link
Contributor

Several deadlocks which would result in the symptoms you describe were recently fixed in the master branch. If your able to consistently reproduce this issue I'd appreciate it if you could try the latest source from github and let me know if the issue remains. Thanks!

a140dc5 - Fix mmap(2)/write(2)/read(2) deadlock
abd39a8 - Fix zio_execute() deadlock
cfc9a5c - Fix zpl_writepage() deadlock

@mgiammarco
Copy link
Author

Thanks I will try when these patches will go in the ppa repository.
I write now to say that I have found (by accident) a workaround: in the
virtual machine that I us for tests I have found that I have put two virtual
cpus. Now I have put only one cpu and it seems it works (when memory is full
now memory is reclaimed and no task killing occours).

I suppose the deadlocks you have found will help with 2 cpus.

2011/7/20 behlendorf <
[email protected]>

Several deadlocks which would result in the symptoms you describe where
recently fixed in the master branch. If your able to consistently reproduce
this issue I'd appreciate it if you could try the latest source from github
and let me know if the issue remains. Thanks!

a140dc5 - Fix mmap(2)/write(2)/read(2)
deadlock
abd39a8 - Fix zio_execute() deadlock
cfc9a5c - Fix zpl_writepage() deadlock

Reply to this email directly or view it on GitHub:
#261 (comment)

@tstudios
Copy link

4 rsync streams between this linux 233TB pool and one solaris 233TB zfs pool grinds system to a halt. Work around seems to be a taskset command that assigns rsync to 2 physical processors of a 4 physical processor system. After that, open ssh user interfaces will randomly hang for short periods; then recover. Obviously user interactive is peocess scheduled randomly. AMD Opteron dual core, 4 physical cpus, 8gb memory; CentOS 6 kernel update 2.6.32-71.29.1.el6.x86_64, spl and zfs 0.6.0-rc5. I realize none of this is technical but rsyncs are in progress, I'm home on a VPN. Rsync hang is an open bug for the slightly past due rc6 and/or 0.6.0 stable release. FYI - We do have reasonf for doing zfs send/receive duplications. The CentOS system can dual boot Solaris, i.e. its original OS.

@tstudios
Copy link

tstudios commented Aug 2, 2011

Typos in last comment. Should read "We do have reasons for not doing zfs send/receive duplications." Rsyncs are running. Currently set to run on processor #3 via taskset. Interactive processes do recover after hangs. Did have open 10 ssh sessions today. A date, sleep 1 loop was running in all shells. Generally, only one shell would hang at a time... perhaps between 2 and 15 seconds.

@ahamilto
Copy link

ahamilto commented Aug 2, 2011

I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.

@mgiammarco
Copy link
Author

I agree. Putting a very small zfs_arc_max slow down the hanging of the
system. It seems that something during rsync wake up a memory leak.

2011/8/2 Alphalead <
[email protected]>

I am still seeing this issue with the latest versions built from git.
Setting the zfs_arc_max option to a relatively small number (1GB on a
system with 6GB RAM in this case) helps to slow the failure. When running
I/O intensive loads that aren't an rsync (the AFS backup system that I'm
running on the system is writing 250GB to a single large file), the arc size
as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840
(almost exactly the value given to modprobe) and then holds at that size.
When running an rsync however, that size climbs steadily through the
maximum value specified (even if the rsync is just verifying that files have
not changed) and keeps increasing until the system starts swapping at which
point it begins to lag badly. A workaround we have in place for the moment
is to rsync a portion at a time and completely unload ZFS between the rsyncs
which frees the memory.

Reply to this email directly or view it on GitHub:
#261 (comment)

@tstudios
Copy link

tstudios commented Aug 2, 2011

Not in a position to unload ZFS between rsyncs. Other user processes need/use the pool. Just reporting what I, as an "ordinary" interactive user would see. Re-enforcing that the bug does exist and is noted for a fix in rc6. When fixed, it seems that rc6 and 0.6.0 will be stable enough for us to replace the Solaris OS. Looking to get ZFS and FUSE/LTFS (Linear Tape File System) on the same OS platform. We are looking to off load "deep/old" archived projects onto LTFS. Again we have reasons not to do zfs send/receive to/from tape as we have reasons not to do zfs send/receive between stand alone systems/pools.

@mgiammarco
Copy link
Author

Anyway I have done a tar reading files and writing to /dev/null and the
system hangs, so problem is not due to disk writes.

2011/8/2 Mario Giammarco [email protected]

I agree. Putting a very small zfs_arc_max slow down the hanging of the
system. It seems that something during rsync wake up a memory leak.

2011/8/2 Alphalead <
[email protected]>

I am still seeing this issue with the latest versions built from git.
Setting the zfs_arc_max option to a relatively small number (1GB on a
system with 6GB RAM in this case) helps to slow the failure. When running
I/O intensive loads that aren't an rsync (the AFS backup system that I'm
running on the system is writing 250GB to a single large file), the arc size
as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840
(almost exactly the value given to modprobe) and then holds at that size.
When running an rsync however, that size climbs steadily through the
maximum value specified (even if the rsync is just verifying that files have
not changed) and keeps increasing until the system starts swapping at which
point it begins to lag badly. A workaround we have in place for the moment
is to rsync a portion at a time and completely unload ZFS between the rsyncs
which frees the memory.

Reply to this email directly or view it on GitHub:
#261 (comment)

@behlendorf
Copy link
Contributor

Can you please post a simple test case which can recreate this issue. While this is tagged to be fixed for 0.6.0-rc6 I haven't seriously investigated it yet. A test case would help a lot.

@tstudios
Copy link

tstudios commented Aug 2, 2011

Happy to try and help. I'm a sysadmin and not a programmer in the
technical sense. Do you need my grouping of rsync commands, my multi
open shell interactive test loop, or a more technical detail? For any
debug and/or trace output, a cdl command string would be helpful.

tstudios

Sam Allgood
Sr. Production Engineer
Turner Studios
Turner Broadcasting System
Atlanta, GA 30318

-----Original Message-----
From: behlendorf
[mailto:[email protected]
thub.com]
Sent: Tuesday, August 02, 2011 7:04 PM
To: Allgood, Sam
Subject: Re: [zfs] Hang during rsync (#261)

Can you please post a simple test case which can recreate this issue.
While this is tagged to be fixed for 0.6.0-rc6 I haven't seriously
investigated it yet. A test case would help a lot.

Reply to this email directly or view it on GitHub:
#261 (comment)

@behlendorf
Copy link
Contributor

If you can provide me enough information to recreate the issue locally that would be ideal.

  • exactly what rsync commands are you running
  • how much data are your moving before you hang the system
  • what is the first error reported in dmesg
  • what distribution and kernel version are you using

@tstudios
Copy link

tstudios commented Aug 2, 2011

While not on this system, now dual booting Solaris and Linux, another
identical system (still on Solaris) will be sharing zfs file systems
within the pool via Samba. It would be interesting to know if a large
single file read via Samba will also hang ZFS. Don't want to clutter the
Bug Comments too much here. Seems there are two cases that can hang.

tstudios
Sam Allgood
Turner Studios

-----Original Message-----
From: mgiammarco
[mailto:[email protected]
thub.com]
Sent: Tuesday, August 02, 2011 6:11 PM
To: Allgood, Sam
Subject: Re: [zfs] Hang during rsync (#261)

Anyway I have done a tar reading files and writing to /dev/null and the
system hangs, so problem is not due to disk writes.

2011/8/2 Mario Giammarco [email protected]

I agree. Putting a very small zfs_arc_max slow down the hanging of the
system. It seems that something during rsync wake up a memory leak.

2011/8/2 Alphalead <
[email protected]>

I am still seeing this issue with the latest versions built from git.
Setting the zfs_arc_max option to a relatively small number (1GB on
a
system with 6GB RAM in this case) helps to slow the failure. When
running
I/O intensive loads that aren't an rsync (the AFS backup system that
I'm
running on the system is writing 250GB to a single large file), the
arc size
as given by /proc/spl/kstat/zfs/arcstats quickly goes up to
1073845840
(almost exactly the value given to modprobe) and then holds at that
size.
When running an rsync however, that size climbs steadily through the
maximum value specified (even if the rsync is just verifying that
files have
not changed) and keeps increasing until the system starts swapping at
which
point it begins to lag badly. A workaround we have in place for the
moment
is to rsync a portion at a time and completely unload ZFS between the
rsyncs
which frees the memory.

Reply to this email directly or view it on GitHub:
#261 (comment)

Reply to this email directly or view it on GitHub:
#261 (comment)

@tstudios
Copy link

tstudios commented Aug 2, 2011

It's late here EDT. Tomorrow, I'll give you my exact commands from a
daily running script (real values and not $variables) and the rsync
reports of how much data has moved, bytes sent/received, etc. That plus
dmesg, kernel version.

I'm running taskset -c rsync . This is allowing recovery.
Should I not taskset and let the system grind to a halt?

I do see recent comments about tar reads from zfs pools to /dev/null
also hanging.

tstudios

-----Original Message-----
From: behlendorf
[mailto:[email protected]
thub.com]
Sent: Tuesday, August 02, 2011 7:28 PM
To: Allgood, Sam
Subject: Re: [zfs] Hang during rsync (#261)

If you can provide me enough information to recreate the issue locally
that would be ideal.

  • exactly what rsync commands are you running
  • how much data are your moving before you hang the system
  • what is the first error reported in dmesg
  • what distribution and kernel version are you using

Reply to this email directly or view it on GitHub:
#261 (comment)

@tstudios
Copy link

tstudios commented Aug 3, 2011

Brian,

Below is a synopsis of yesterdays rsyncs. Hopefully, enough information
to
Roughly duplicate my user setup. Hopefully, bad line wraps are not too
annoying. This was pasted in from a simple text editor

tstudios - Sam Allgood
Turner Studios

Kernel is stock from the latest yum update:
[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat /etc/redhat-release
CentOS Linux release 6.0 (Final
[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# uname -rv
2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011

Rsync version:
[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# rsync --version
rsync version 3.0.6 protocol version 30
Copyright (C) 1996-2009 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes

Running the following in 10 or so ssh login shells. I can see one or
more of these hang randomly for random lengths:
while true; do date; sleep 1; done

The task is to sync a Solaris zfs pool with 4 file systems (besides the
root pool)
to this Linux zfs pool with 4 file systems (besides the root pool). This
process has been
running daily for a couple of years now. The Linux box now dual boots
Linux/Solaris. The sync
commands originate on the Linux system over a 10G link with a 9000 mtu.
There is about a minute offset between each command. These did execute
With a taskset -c 3. If I let the system schedule itself, I soon loose
it.

The zfs list below comes from the Linux system and is indicative of the
pool and file systems on
both Linux and Solaris. Just so happens yesterday that my largest file
system had the least data churn.
Spent 4 hours parsing that file system. All file systems are a mixture
of files from small text files to
nearly 1TB archives from a compositing application Autocad Discreet
Flame. The parse times from my log for that least churn/ largest
filesystem:
[root@tsdpl-bu tsdpl-A-10G-fx_archive]# head -2 log.0
SYSTEM:tsdpl-A-10G-fx_archive
START:2011-08-02-12-02-02
[root@tsdpl-bu tsdpl-A-10G-fx_archive]# tail -1 log.0
END:2011-08-02-15-59-19

Hmmnnn.."...first error in dmesg? Presuming dmesg, it looks like a
fairly standard boot. Managing pool import,
legacy mount, ifs and samba from a custom script. Only import and legacy
mownt for now

Okay, now the ifs list, sync arguments, sync summary, and an indication
of files added/deleted.
Note one file system is a legacy mount. Note the pool "root" is not
rsynced. For the --excludes,
SNAPSHOT is a Symbolic link to .zfs, and I don't/can't pull those
across.

NAME USED AVAIL REFER MOUNTPOINT
zdev001 206T 26.3T 22K /tsdpl
zdev001/backups 52.6T 26.3T 52.6T /tsdpl/backups
zdev001/fx_archive 142T 26.3T 142T /tsdpl/fx_archive
zdev001/gfx01 10.1T 26.3T 10.1T legacy
zdev001/gfx_archive 1.72T 26.3T 1.72T /tsdpl/gfx_archive


taskset -c 3 rsync -av --delete-during
--sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY
--block-size=65536 --exclude SNAPSHOT --exclude .zfs
tsdpl::root/tsdpl/backups/ /tsdpl/backups

sent 13798995 bytes received 190547787390 bytes 9751136.57 bytes/sec
total size is 57435529244802 speedup is 301.40

[root@tsdpl-bu tsdpl-A-10G-backups]# cat log.0 | grep deleting | wc -l
991
[root@tsdpl-bu tsdpl-A-10G-backups]# cat log.0 | grep -v deleting | wc
-l
66126


taskset -c 3 rsync -av --delete-during
--sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY
--block-size=65536 --exclude .zfs tsdpl::root/tsdpl/fx_archive/
/tsdpl/fx_archive

receiving incremental file list
system/cron_archive_log.txt
system/cron_catalog_log.txt

sent 276728 bytes received 268401036 bytes 18871.13 bytes/sec
total size is 155228895891957 speedup is 577751.18


taskset -c 3 rsync -av --delete-during
--sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY
--block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/gfx01/
/gfx01

sent 1662192 bytes received 83044609836 bytes 10524842.79 bytes/sec
total size is 11025515851641 speedup is 132.76

[root@tsdpl-bu tsdpl-A-10G-gfx01]# cat log.0 | grep deleting | wc -l
289446
[root@tsdpl-bu tsdpl-A-10G-gfx01]# cat log.0 | grep -v deleting | wc -l
85258


taskset -c 3 rsync -av --delete-during
--sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY
--block-size=65536 --exclude SNAPSHOT --exclude .zfs
tsdpl::root/tsdpl/gfx_archive/ /tsdpl/gfx_archive

sent 84485 bytes received 1411890863 bytes 1125528.38 bytes/sec
total size is 1875081239999 speedup is 1327.98

[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat log.0 | grep deleting | wc
-l
0
[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat log.0 | grep -v deleting |
wc -l
372
-----Original Message-----
From: behlendorf
[mailto:[email protected]
thub.com]
Sent: Tuesday, August 02, 2011 7:28 PM
To: Allgood, Sam
Subject: Re: [zfs] Hang during rsync (#261)

If you can provide me enough information to recreate the issue locally
that would be ideal.

  • exactly what rsync commands are you running
  • how much data are your moving before you hang the system
  • what is the first error reported in dmesg
  • what distribution and kernel version are you using

Reply to this email directly or view it on GitHub:
#261 (comment)

@ahamilto
Copy link

ahamilto commented Aug 8, 2011

rsyncing 1.5TB of data in a few large files (20 files total, ranging in size from 17GB to 140GB compressed; 18GB to 232GB uncompressed) is an example of an rsync that runs fine with no memory issues.

rsyncing 60GB of data in ~2 million files (mail storage backups) will result in the arc cache ballooning beyond the defined maximum. The call stack when the system starts swapping is:

[134432.390032] Pid: 25143, comm: kworker/u:1 Tainted: P 2.6.36 #7
[134432.390036] Call Trace:
[134432.390050] [] __alloc_pages_nodemask+0x5bb/0x647
[134432.390058] [] ? autoremove_wake_function+0x0/0x34
[134432.390066] [] tcp_sendmsg+0x3fe/0x898
[134432.390073] [] inet_sendmsg+0x8f/0x98
[134432.390081] [] ? selinux_socket_sendmsg+0x1e/0x20
[134432.390088] [] __sock_sendmsg+0x66/0x73
[134432.390093] [] sock_sendmsg+0xa3/0xbc
[134432.390106] [] ? iscsi_tcp_segment_done+0x123/0x2aa [libiscsi_tcp]
[134432.390112] [] kernel_sendmsg+0x32/0x3e
[134432.390119] [] iscsi_sw_tcp_pdu_xmit+0xdf/0x236 [iscsi_tcp]
[134432.390126] [] iscsi_tcp_task_xmit+0x34/0x236 [libiscsi_tcp]
[134432.390134] [] ? __dequeue_entity+0x2e/0x33
[134432.226918] 1330890 pages non-shared
[134432.390026] kworker/u:1: page allocation failure. order:0, mode:0x20
[134432.390032] Pid: 25143, comm: kworker/u:1 Tainted: P 2.6.36 #7
[134432.390036] Call Trace:
[134432.390050] [] __alloc_pages_nodemask+0x5bb/0x647
[134432.390058] [] ? autoremove_wake_function+0x0/0x34
[134432.390066] [] tcp_sendmsg+0x3fe/0x898
[134432.390073] [] inet_sendmsg+0x8f/0x98
[134432.390081] [] ? selinux_socket_sendmsg+0x1e/0x20
[134432.390088] [] __sock_sendmsg+0x66/0x73
[134432.390093] [] sock_sendmsg+0xa3/0xbc
[134432.390106] [] ? iscsi_tcp_segment_done+0x123/0x2aa [libiscsi_tcp]
[134432.390112] [] kernel_sendmsg+0x32/0x3e
[134432.390119] [] iscsi_sw_tcp_pdu_xmit+0xdf/0x236 [iscsi_tcp]
[134432.390126] [] iscsi_tcp_task_xmit+0x34/0x236 [libiscsi_tcp]
[134432.390134] [] ? __dequeue_entity+0x2e/0x33
[134432.390139] [] ? set_next_entity+0x23/0x73
[134432.390146] [] ? __switch_to+0xea/0x212
[134432.390157] [] iscsi_xmit_task+0x52/0x94 [libiscsi]
[134432.390165] [] iscsi_xmitworker+0x1c2/0x2b9 [libiscsi]
[134432.390173] [] ? bond_3ad_state_machine_handler+0x865/0x8b0
[134432.390181] [] ? iscsi_xmitworker+0x0/0x2b9 [libiscsi]
[134432.390188] [] process_one_work+0x1d9/0x2e7
[134432.390194] [] worker_thread+0x133/0x246
[134432.390199] [] ? worker_thread+0x0/0x246
[134432.390204] [] kthread+0x7d/0x85
[134432.390211] [] kernel_thread_helper+0x4/0x10
[134432.390216] [] ? kthread+0x0/0x85
[134432.390221] [] ? kernel_thread_helper+0x0/0x10
[134432.390224] Mem-Info:
[134432.390227] DMA per-cpu:
[134432.390230] CPU 0: hi: 0, btch: 1 usd: 0
[134432.390234] CPU 1: hi: 0, btch: 1 usd: 0
[134432.390237] CPU 2: hi: 0, btch: 1 usd: 0
[134432.390241] CPU 3: hi: 0, btch: 1 usd: 0
[134432.390244] DMA32 per-cpu:
[134432.390247] CPU 0: hi: 186, btch: 31 usd: 0
[134432.390251] CPU 1: hi: 186, btch: 31 usd: 0
[134432.390254] CPU 2: hi: 186, btch: 31 usd: 0
[134432.390257] CPU 3: hi: 186, btch: 31 usd: 0
[134432.390260] Normal per-cpu:
[134432.390263] CPU 0: hi: 186, btch: 31 usd: 0
[134432.390267] CPU 1: hi: 186, btch: 31 usd: 6
[134432.390270] CPU 2: hi: 186, btch: 31 usd: 0
[134432.390274] CPU 3: hi: 186, btch: 31 usd: 1
[134432.390282] active_anon:1244 inactive_anon:381 isolated_anon:0
[134432.390283] active_file:225 inactive_file:536 isolated_file:0
[134432.390285] unevictable:850 dirty:0 writeback:0 unstable:0
[134432.390286] free:6756 slab_reclaimable:982 slab_unreclaimable:187430
[134432.390288] mapped:764 shmem:6 pagetables:1126 bounce:0
[134432.390300] DMA free:15852kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[134432.390308] lowmem_reserve[]: 0 3512 5974 5974
[134432.390322] DMA32 free:10632kB min:5804kB low:7252kB high:8704kB active_anon:0kB inactive_anon:96kB active_file:36kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596460kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:72kB slab_unreclaimable:449936kB kernel_stack:288kB pagetables:1136kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[134432.390330] lowmem_reserve[]: 0 0 2461 2461
[134432.390343] Normal free:540kB min:4068kB low:5084kB high:6100kB active_anon:4976kB inactive_anon:1428kB active_file:864kB inactive_file:2120kB unevictable:3400kB isolated(anon):0kB isolated(file):0kB present:2520956kB mlocked:3400kB dirty:0kB writeback:0kB mapped:2996kB shmem:24kB slab_reclaimable:3856kB slab_unreclaimable:299720kB kernel_stack:2448kB pagetables:3368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:67 all_unreclaimable? no
[134432.390356] lowmem_reserve[]: 0 0 0 0
[134432.390362] DMA: 7_4kB 6_8kB 6_16kB 6_32kB 4_64kB 5_128kB 5_256kB 2_512kB 2_1024kB 1_2048kB 2_4096kB = 15852kB
[134432.390377] DMA32: 76_4kB 245_8kB 81_16kB 85_32kB 38_64kB 11_128kB 2_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 10632kB
[134432.390391] Normal: 113_4kB 11_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0*4096kB = 540kB
[134432.390406] 1760 total pagecache pages
[134432.390409] 527 pages in swap cache
[134432.390413] Swap cache stats: add 1779199, delete 1778672, find 414116/728800
[134432.390416] Free swap = 1004328kB
[134432.390419] Total swap = 1048572kB
[134432.426523] 1556478 pages RAM
[134432.426527] 44442 pages reserved
[134432.426529] 5987 pages shared
[134432.426532] 1330890 pages non-shared
[134432.713636] kworker/u:1: page allocation failure. order:0, mode:0x20

@behlendorf
Copy link
Contributor

@Alphalead: It certainly sounds like the meta data limits aren't being enforced properly for the arc, have you checked the /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab when it's in this state. I'd be interested to see what they report, and where the memory is being used.

@bill-mcgonigle
Copy link

I seem to be hitting this problem too. Just default values on everything (I'll implement the suggestions above) except for zpool being created with ashift=12.

Here are /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab with the machine effectively useless. rsync was copying from local ext4 to local zfs. Load average was about 18.5 but there was no I/O to speak of. There was 6900M or so of RAM allocated to this VM (a Xen dom0) with htop reporting ~5900 in use (but not by the applications, they can barely use 2GB). Killing rsync didn't help, and when I forced the machine to reboot I saw an rsync process segfault right before the actual system reboot (perhaps the open files got pulled out from under it?)

3 1 0x01 61 2928 146583166654 245155398080153
name                            type data
hits                            4    38630741
misses                          4    11918394
demand_data_hits                4    1051214
demand_data_misses              4    210641
demand_metadata_hits            4    32812696
demand_metadata_misses          4    10711823
prefetch_data_hits              4    44925
prefetch_data_misses            4    112506
prefetch_metadata_hits          4    4721906
prefetch_metadata_misses        4    883424
mru_hits                        4    2313878
mru_ghost_hits                  4    1762080
mfu_hits                        4    31550033
mfu_ghost_hits                  4    9489768
deleted                         4    12505789
recycle_miss                    4    47548198
mutex_miss                      4    77617
evict_skip                      4    5325335645
evict_l2_cached                 4    0
evict_l2_eligible               4    894196381184
evict_l2_ineligible             4    9847328256
hash_elements                   4    576798
hash_elements_max               4    576799
hash_collisions                 4    7084752
hash_chains                     4    157562
hash_chain_max                  4    8
p                               4    360042944
c                               4    2180250624
c_min                           4    2180250624
c_max                           4    17442004992
size                            4    2819764592
hdr_size                        4    184016480
data_size                       4    662532096
other_size                      4    1973216016
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_hdr_size                     4    0
memory_throttle_count           4    9491
memory_direct_count             4    59
memory_indirect_count           4    321540
arc_no_grow                     4    1
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_meta_used                   4    2819764592
arc_meta_limit                  4    4360501248
arc_meta_max                    4    3733333912
--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max
spl_vn_cache                          0x00020         0         0     4096       88      0     0     7      0     0   128
spl_vn_file_cache                     0x00020         0         0     4096       88      0     0     0      0     0     0
spl_zlib_workspace_cache              0x00040         0         0  8388608   268104      0     0     0      0     0     0
zio_cache                             0x00040    360448    333792    16384      912     22    22  1956    374   366 33252
zio_link_cache                        0x00020     36864     14304     4096       48      9     9   749    405   298 33664
zio_buf_512                           0x00042 1372160000 643706368    32768      512  41875 41875 50625  1298125 1257239 1569352
zio_data_buf_512                      0x00042         0         0    32768      512      0     0   843      0     0 26110
zio_buf_1024                          0x00042    131072     65536    32768     1024      4     4   448     84    64  9407
zio_data_buf_1024                     0x00042         0         0    32768     1024      0     0   965      0     0 20263
zio_buf_1536                          0x00042    196608     98304    65536     1536      3     3   164     93    64  5084
zio_data_buf_1536                     0x00042         0         0    65536     1536      0     0  1386      0     0 42947
zio_buf_2048                          0x00042    196608    149504    65536     2048      3     3   175     75    73  4359
zio_data_buf_2048                     0x00042         0         0    65536     2048      0     0  3525      0     0 88125
zio_buf_2560                          0x00042    262144    163840    65536     2560      4     4   254     84    64  5324
zio_data_buf_2560                     0x00042         0         0    65536     2560      0     0  6054      0     0 127120
zio_buf_3072                          0x00042         0         0    65536     3072      0     0   356      0     0  6400
zio_data_buf_3072                     0x00042         0         0    65536     3072      0     0  3637      0     0 65464
zio_buf_3584                          0x00042         0         0   131072     3584      0     0    46      0     0  1408
zio_data_buf_3584                     0x00042         0         0   131072     3584      0     0   466      0     0 14435
zio_buf_4096                          0x00042   1048576    262144   262144     4096      4     4   355    124    64 11005
zio_data_buf_4096                     0x00042         0         0   262144     4096      0     0   551      0     0 17073
zio_buf_5120                          0x00042    262144    163840   131072     5120      2     2    80     42    32  1661
zio_data_buf_5120                     0x00042         0         0   131072     5120      0     0  1458      0     0 30598
zio_buf_6144                          0x00042    262144    190464   131072     6144      2     2    57     36    31  1023
zio_data_buf_6144                     0x00042         0         0   131072     6144      0     0   819      0     0 14738
zio_buf_7168                          0x00042    786432    458752   262144     7168      3     3    25     93    64   767
zio_data_buf_7168                     0x00042         0         0   262144     7168      0     0   225      0     0  6955
zio_buf_8192                          0x00042    786432    425984   262144     8192      3     3   154     63    52  3218
zio_data_buf_8192                     0x00042   1048576    524288   262144     8192      4     4   447     84    64  9378
zio_buf_10240                         0x00042         0         0   262144    10240      0     0    39      0     0   804
zio_data_buf_10240                    0x00042         0         0   262144    10240      0     0   181      0     0  3797
zio_buf_12288                         0x00042   1048576    454656   524288    12288      2     2    24     62    37   722
zio_data_buf_12288                    0x00042         0         0   524288    12288      0     0    79      0     0  2424
zio_buf_14336                         0x00042         0         0   524288    14336      0     0    10      0     0   305
zio_data_buf_14336                    0x00042         0         0   524288    14336      0     0    51      0     0  1555
zio_buf_16384                         0x00042 1329070080 663404544   524288    16384   2535  2535  6746  63375 40491 168650
zio_data_buf_16384                    0x00042         0         0   524288    16384      0     0    83      0     0  2071
zio_buf_20480                         0x00042   2097152   1310720   524288    20480      4     4    40     84    64   829
zio_data_buf_20480                    0x00042         0         0   524288    20480      0     0   190      0     0  3981
zio_buf_24576                         0x00042   1572864   1327104   524288    24576      3     3    31     54    54   548
zio_data_buf_24576                    0x00042         0         0   524288    24576      0     0   104      0     0  1864
zio_buf_28672                         0x00042         0         0  1048576    28672      0     0    12      0     0   366
zio_data_buf_28672                    0x00042         0         0  1048576    28672      0     0    66      0     0  2033
zio_buf_32768                         0x00042   3145728   2097152  1048576    32768      3     3    12     84    64   320
zio_data_buf_32768                    0x00042         0         0  1048576    32768      0     0    81      0     0  2244
zio_buf_36864                         0x00042   2097152   1179648  1048576    36864      2     2    13     50    32   323
zio_data_buf_36864                    0x00042         0         0  1048576    36864      0     0   104      0     0  2599
zio_buf_40960                         0x00042   1048576         0  1048576    40960      1     0    15     23     0   338
zio_data_buf_40960                    0x00042         0         0  1048576    40960      0     0    30      0     0   689
zio_buf_45056                         0x00042         0         0  1048576    45056      0     0    11      0     0   231
zio_data_buf_45056                    0x00042         0         0  1048576    45056      0     0    32      0     0   652
zio_buf_49152                         0x00042   4194304   3145728  1048576    49152      4     4    10     76    64   173
zio_data_buf_49152                    0x00042         0         0  1048576    49152      0     0    38      0     0   705
zio_buf_53248                         0x00042         0         0  1048576    53248      0     0    23      0     0   403
zio_data_buf_53248                    0x00042         0         0  1048576    53248      0     0    37      0     0   664
zio_buf_57344                         0x00042         0         0  1048576    57344      0     0    19      0     0   308
zio_data_buf_57344                    0x00042         0         0  1048576    57344      0     0    38      0     0   642
zio_buf_61440                         0x00042         0         0  2097152    61440      0     0     7      0     0   187
zio_data_buf_61440                    0x00042         0         0  2097152    61440      0     0    18      0     0   532
zio_buf_65536                         0x00042  39845888  31653888  2097152    65536     19    19    22    570   483   641
zio_data_buf_65536                    0x00042         0         0  2097152    65536      0     0    14      0     0   420
zio_buf_69632                         0x00042         0         0  2097152    69632      0     0    12      0     0   320
zio_data_buf_69632                    0x00042         0         0  2097152    69632      0     0    14      0     0   384
zio_buf_73728                         0x00042         0         0  2097152    73728      0     0    11      0     0   286
zio_data_buf_73728                    0x00042         0         0  2097152    73728      0     0     9      0     0   232
zio_buf_77824                         0x00042         0         0  2097152    77824      0     0    12      0     0   288
zio_data_buf_77824                    0x00042         0         0  2097152    77824      0     0     8      0     0   200
zio_buf_81920                         0x00042         0         0  2097152    81920      0     0     7      0     0   161
zio_data_buf_81920                    0x00042         0         0  2097152    81920      0     0    11      0     0   245
zio_buf_86016                         0x00042         0         0  2097152    86016      0     0     6      0     0   133
zio_data_buf_86016                    0x00042         0         0  2097152    86016      0     0     9      0     0   206
zio_buf_90112                         0x00042         0         0  2097152    90112      0     0     6      0     0   128
zio_data_buf_90112                    0x00042         0         0  2097152    90112      0     0     9      0     0   188
zio_buf_94208                         0x00042         0         0  2097152    94208      0     0     8      0     0   155
zio_data_buf_94208                    0x00042         0         0  2097152    94208      0     0    13      0     0   263
zio_buf_98304                         0x00042         0         0  2097152    98304      0     0    17      0     0   324
zio_data_buf_98304                    0x00042         0         0  2097152    98304      0     0    13      0     0   244
zio_buf_102400                        0x00042         0         0  2097152   102400      0     0    12      0     0   223
zio_data_buf_102400                   0x00042         0         0  2097152   102400      0     0    10      0     0   186
zio_buf_106496                        0x00042         0         0  2097152   106496      0     0    16      0     0   288
zio_data_buf_106496                   0x00042         0         0  2097152   106496      0     0    12      0     0   212
zio_buf_110592                        0x00042         0         0  2097152   110592      0     0     9      0     0   161
zio_data_buf_110592                   0x00042         0         0  2097152   110592      0     0    12      0     0   211
zio_buf_114688                        0x00042         0         0  2097152   114688      0     0    12      0     0   192
zio_data_buf_114688                   0x00042         0         0  2097152   114688      0     0    10      0     0   156
zio_buf_118784                        0x00042         0         0  2097152   118784      0     0     6      0     0    97
zio_data_buf_118784                   0x00042         0         0  2097152   118784      0     0    12      0     0   191
zio_buf_122880                        0x00042         0         0  2097152   122880      0     0     7      0     0    97
zio_data_buf_122880                   0x00042         0         0  2097152   122880      0     0    16      0     0   256
zio_buf_126976                        0x00042         0         0  4194304   126976      0     0     4      0     0    98
zio_data_buf_126976                   0x00042         0         0  4194304   126976      0     0     6      0     0   162
zio_buf_131072                        0x00042         0         0  4194304   131072      0     0    20      0     0   593
zio_data_buf_131072                   0x00042         0         0  4194304   131072      0     0   816      0     0 25268
sa_cache                              0x00020  50860032  28278800     4096       80  12417 12417 19334  409761 353485 637997
dnode_t                               0x00040 1327349760 1186980880    16384      944  81015 81015 96886  1296240 1257395 1550176
dmu_buf_impl_t                        0x00020 485531648 384151168     8192      296  59269 59269 68442  1422456 1297808 1642587
arc_buf_hdr_t                         0x00020 196968448 166191840     8192      288  24044 24044 24044  577056 577055 577055
arc_buf_t                             0x00020  25071616   4558288     4096      112   6121  6121 14313  159146 40699 372126
zil_lwb_cache                         0x00020     65536     49536     4096      192     16    16   192    272   258  3253
zfs_znode_cache                       0x00020 411664384 342173480    16384      968  25126 25126 39873  402016 353485 637968

@behlendorf
Copy link
Contributor

Interesting. So it looks like the bulk of the memory is being consumed by meta data in the ARC. It also looks like memory is getting fragmented, but it's not horrible yet. What's unclear is why all of those dnodes, inodes, and meta-data buffers aren't being released to the system. They should be unless they are still considered to be in use. Have you tried forcing the caches to be dropped with drop_caches? This will cause everything which is unused to be reclaimed, which should be happening anyway. Also, have you seen this issue with an other apps besides rsync? Say a find in the directory or something else which accesses all of these files.

echo 1 >/proc/sys/vm/drop_caches

@mgiammarco
Copy link
Author

I had the same problem with a tar that reads from zfs and writes to
/dev/null
Il giorno 19/ago/2011 20:04, "behlendorf" <
[email protected]> ha
scritto:

Interesting. So it looks like the bulk of the memory is being consumed by
meta data in the ARC. It also looks like memory is getting fragmented, but
it's not horrible yet. What's unclear is why all of those dnodes, inodes,
and meta-data buffers aren't being released to the system. They should be
unless they are still considered to be in use. Have you tried forcing the
caches to be dropped with drop_caches? This will cause everything which is
unused to be reclaimed, which should be happening anyway. Also, have you
seen this issue with an other apps besides rsync? Say a find in the
directory or something else which accesses all of these files.

echo 1 >/proc/sys/vm/drop_caches

Reply to this email directly or view it on GitHub:
#261 (comment)

@bill-mcgonigle
Copy link

OK, so I did:

find /backup -ls > /dev/null

It ate about 4GB of RAM, I killed it when there was about a gig left.

I have, in /etc/modprobe.d/zfs.conf:

options zfs zfs_arc_max=107374182 zfs_arc_meta_limit=107374182 zfs_arc_max=107374182

The filesystem options are:

# zfs get all backup/snapshots| grep -v default
NAME              PROPERTY              VALUE                  SOURCE
backup/snapshots  type                  filesystem             -
backup/snapshots  creation              Sun Aug 14 17:51 2011  -
backup/snapshots  used                  597G                   -
backup/snapshots  available             1.20T                  -
backup/snapshots  referenced            597G                   -
backup/snapshots  compressratio         1.13x                  -
backup/snapshots  mounted               yes                    -
backup/snapshots  compression           on                     inherited from backup
backup/snapshots  version               5                      -
backup/snapshots  utf8only              off                    -
backup/snapshots  normalization         none                   -
backup/snapshots  casesensitivity       sensitive              -
backup/snapshots  primarycache          metadata               local
backup/snapshots  usedbysnapshots       0                      -
backup/snapshots  usedbydataset         597G                   -
backup/snapshots  usedbychildren        0                      -
backup/snapshots  usedbyrefreservation  0                      -
echo 1 >/proc/sys/vm/drop_caches

didn't seem to do anything.

lsof | grep find

returns nothing

zfs export backup

lets me watch memory usage fall by about 100MB/s until it's back down to where it started and the zpool unmounts.

I grabbed the stats as above again, I'll post them if they'd have value (don't want to overly clutter).

@bill-mcgonigle
Copy link

ls -R

will even do it, though to a lesser extent.

@awalland
Copy link

i'm not sure if this is the same thing but here is what i get when i rsync many files to zfs:

Aug 21 20:49:00 anterovipunen kernel: [ 1320.307559] kswapd0 D ffff88007788c2c0 0 21 2 0x00000000
Aug 21 20:49:00 anterovipunen kernel: [ 1320.307567] ffff88007aa49a40 0000000000000046 ffff88007aa49970 ffffffff81049bcd
Aug 21 20:49:00 anterovipunen kernel: [ 1320.309635] ffff88007b4314d0 ffff88007aa49fd8 ffff88007aa49fd8 ffff88007aa49fd8
Aug 21 20:49:00 anterovipunen kernel: [ 1320.311666] ffff88007395b090 ffff88007b4314d0 ffff88007f492540 0000000000000001
Aug 21 20:49:00 anterovipunen kernel: [ 1320.313632] Call Trace:
Aug 21 20:49:00 anterovipunen kernel: [ 1320.315639] [] ? ttwu_do_wakeup+0x3d/0x110
Aug 21 20:49:00 anterovipunen kernel: [ 1320.317629] [] ? ttwu_do_activate.constprop.166+0x66/0x70
Aug 21 20:49:00 anterovipunen kernel: [ 1320.319657] [] ? try_to_wake_up+0xce/0x280
Aug 21 20:49:00 anterovipunen kernel: [ 1320.321656] [] ? default_wake_function+0x12/0x20
Aug 21 20:49:00 anterovipunen kernel: [ 1320.323668] [] ? prepare_to_wait_exclusive+0x60/0x90
Aug 21 20:49:00 anterovipunen kernel: [ 1320.325679] [] cv_wait_common+0x77/0xd0 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.327773] [] ? abort_exclusive_wait+0xb0/0xb0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.329781] [] __cv_wait+0x13/0x20 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.331784] [] txg_wait_open+0x73/0xa0 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.333698] [] dmu_tx_wait+0xed/0xf0 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.335572] [] dmu_tx_assign+0x66/0x420 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.337404] [] zfs_inactive+0xcf/0x1a0 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.339151] [] ? truncate_pagecache+0x59/0x70
Aug 21 20:49:00 anterovipunen kernel: [ 1320.340879] [] zpl_evict_inode+0x28/0x30 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.342550] [] evict+0x85/0x150
Aug 21 20:49:00 anterovipunen kernel: [ 1320.344202] [] dispose_list+0x3e/0x50
Aug 21 20:49:00 anterovipunen kernel: [ 1320.345911] [] shrink_icache_memory+0x17a/0x320
Aug 21 20:49:00 anterovipunen kernel: [ 1320.347573] [] shrink_slab+0x11d/0x190
Aug 21 20:49:00 anterovipunen kernel: [ 1320.349210] [] balance_pgdat+0x51a/0x6e0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.350951] [] kswapd+0x10a/0x3e0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.352601] [] ? abort_exclusive_wait+0xb0/0xb0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.354264] [] ? balance_pgdat+0x6e0/0x6e0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.355909] [] kthread+0x8c/0xa0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.357557] [] kernel_thread_helper+0x4/0x10
Aug 21 20:49:00 anterovipunen kernel: [ 1320.359175] [] ? kthread_worker_fn+0x190/0x190
Aug 21 20:49:00 anterovipunen kernel: [ 1320.360727] [] ? gs_change+0x13/0x13
Aug 21 20:49:00 anterovipunen kernel: [ 1320.365422] txg_sync D ffff88007788c440 0 1294 2 0x00000000
Aug 21 20:49:00 anterovipunen kernel: [ 1320.365428] ffff880071629c00 0000000000000046 ffff880071629b00 ffffffff8104c396
Aug 21 20:49:00 anterovipunen kernel: [ 1320.367054] ffff88007395b780 ffff880071629fd8 ffff880071629fd8 ffff880071629fd8
Aug 21 20:49:00 anterovipunen kernel: [ 1320.368679] ffff8800743f4c50 ffff88007395b780 0000000000000010 ffff8800765e4800
Aug 21 20:49:00 anterovipunen kernel: [ 1320.370320] Call Trace:
Aug 21 20:49:00 anterovipunen kernel: [ 1320.371934] [] ? ttwu_do_activate.constprop.166+0x66/0x70
Aug 21 20:49:00 anterovipunen kernel: [ 1320.373630] [] ? default_wake_function+0x12/0x20
Aug 21 20:49:00 anterovipunen kernel: [ 1320.375292] [] ? __wake_up_common+0x58/0x90
Aug 21 20:49:00 anterovipunen kernel: [ 1320.376960] [] ? prepare_to_wait_exclusive+0x60/0x90
Aug 21 20:49:00 anterovipunen kernel: [ 1320.378627] [] cv_wait_common+0x77/0xd0 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.380316] [] ? abort_exclusive_wait+0xb0/0xb0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.382002] [] __cv_wait+0x13/0x20 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.383720] [] zio_wait+0xfb/0x170 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.385436] [] dsl_pool_sync+0xca/0x450 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.387182] [] spa_sync+0x38e/0xa00 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.388898] [] ? default_wake_function+0x12/0x20
Aug 21 20:49:00 anterovipunen kernel: [ 1320.390644] [] ? autoremove_wake_function+0x16/0x40
Aug 21 20:49:00 anterovipunen kernel: [ 1320.392390] [] ? __wake_up+0x53/0x70
Aug 21 20:49:00 anterovipunen kernel: [ 1320.394162] [] txg_sync_thread+0x216/0x370 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.395941] [] ? txg_quiesce_thread+0x240/0x240 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.397786] [] ? txg_quiesce_thread+0x240/0x240 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.399576] [] thread_generic_wrapper+0x78/0x90 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.401380] [] ? __thread_create+0x160/0x160 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.403186] [] kthread+0x8c/0xa0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.404978] [] kernel_thread_helper+0x4/0x10
Aug 21 20:49:00 anterovipunen kernel: [ 1320.406761] [] ? kthread_worker_fn+0x190/0x190
Aug 21 20:49:00 anterovipunen kernel: [ 1320.408517] [] ? gs_change+0x13/0x13
Aug 21 20:49:00 anterovipunen kernel: [ 1320.413568] rsync D 0000000100041566 0 1508 1507 0x00000000
Aug 21 20:49:00 anterovipunen kernel: [ 1320.413574] ffff8800474fbb28 0000000000000082 0000000100041566 ffffffff81011f33
Aug 21 20:49:00 anterovipunen kernel: [ 1320.415194] ffff8800765d3e70 ffff8800474fbfd8 ffff8800474fbfd8 ffff8800474fbfd8
Aug 21 20:49:00 anterovipunen kernel: [ 1320.416773] ffff8800765d14d0 ffff8800765d3e70 ffff88007f492f40 0000000000000086
Aug 21 20:49:00 anterovipunen kernel: [ 1320.418346] Call Trace:
Aug 21 20:49:00 anterovipunen kernel: [ 1320.419858] [] ? native_sched_clock+0x13/0x60
Aug 21 20:49:00 anterovipunen kernel: [ 1320.421410] [] ? try_to_wake_up+0xce/0x280
Aug 21 20:49:00 anterovipunen kernel: [ 1320.422947] [] ? default_wake_function+0x12/0x20
Aug 21 20:49:00 anterovipunen kernel: [ 1320.424479] [] cv_wait_common+0x77/0xd0 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.425990] [] ? abort_exclusive_wait+0xb0/0xb0
Aug 21 20:49:00 anterovipunen kernel: [ 1320.427520] [] __cv_wait+0x13/0x20 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.429043] [] txg_wait_open+0x73/0xa0 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.430584] [] dmu_tx_wait+0xed/0xf0 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.432098] [] zfs_setattr+0x15ac/0x1810 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.433602] [] ? do_lookup+0x50/0x310
Aug 21 20:49:00 anterovipunen kernel: [ 1320.435090] [] ? do_lookup+0x50/0x310
Aug 21 20:49:00 anterovipunen kernel: [ 1320.436572] [] ? __kmalloc+0x11c/0x160
Aug 21 20:49:00 anterovipunen kernel: [ 1320.438045] [] ? kmem_alloc_debug+0xbb/0x120 [spl]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.439553] [] zpl_setattr+0xdc/0x110 [zfs]
Aug 21 20:49:00 anterovipunen kernel: [ 1320.441060] [] ? kmem_cache_free+0x2f/0x120
Aug 21 20:49:00 anterovipunen kernel: [ 1320.442561] [] notify_change+0x1b1/0x390
Aug 21 20:49:00 anterovipunen kernel: [ 1320.444067] [] utimes_common+0xc1/0x150
Aug 21 20:49:00 anterovipunen kernel: [ 1320.445556] [] do_utimes+0xec/0x100
Aug 21 20:49:00 anterovipunen kernel: [ 1320.447039] [] sys_utimensat+0x58/0x80
Aug 21 20:49:00 anterovipunen kernel: [ 1320.448505] [] system_call_fastpath+0x16/0x1b

it works fine for a while and then just dies.
i was using the latest repo code on arch linux, kernel 3.0.3 with the recommended options from the FAQ.

@ghost
Copy link

ghost commented Aug 22, 2011

Hi - Not sure what's going on here as I don't know much regarding the programming/debugging of zfs but I seems to experience that issue, that is with or without rsync. I never had to read much files on my system as I use zfs for backups storage, however I just had to restore things from the zfs pool and it crashed after a while.. rebooted.. crashed after a while..

Here is the error I found in dmesg/syslog : http://pastebin.com/jMTCNEFy

If there is anything I can do to help, as far as testings, don't hesitate to let me know.

Thanks,

edit: using git from 2011-08-22 on debian squeeze

@bill-mcgonigle
Copy link

I think I started seeing this when I updated to rc5. I was running rc4 with the ashift patch previously, but I also updated from 2.6.32 to 2.6.40 kernel, so both things changed. I also changed the disks from 1.5TB to 2TB.

The reporter's messages show v0.6.0.16, but mine shows only v0.6.0.

As a workaround, I'm making a script that does each step in the backup and then doing a zpool export/import between each step.

@ahamilto
Copy link

I found another workaround to this issue. Running the following on the system while running long rsyncs keeps the memory usage down to the specified arc sizes: while true; do echo 1 >/proc/sys/vm/drop_caches; sleep 60; done;

I'm not sure why this works while just running a single echo 1 >/proc/sys/vm/drop_caches doesn't free larger amounts of RAM when they are in use.

@ghost
Copy link

ghost commented Sep 11, 2011

Alphalead : I think your trick allowed my rsync session to last longer but after a while it crashed again unfortunately

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.981661] Oops: 0002 [#1] SMP

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.981673] last sysfs file: /sys/module/mbcache/initstate

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982056] Stack:

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982133] Call Trace:

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982206] Code: 49 8b 14 04 48 c1 e2 03 e8 83 88 ff ff 85 c0 75 10 48 8d 54 24 70 48 89 de 44 89 ef e8 5b f3 ff ff 48 8b 54 24 50 be d0 00 00 00 <48> c7 02 00 00 00 00 48 8b 54 24 48 48 8b 7c 24 70 e8 7d f6 ff

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982346] CR2: 0000000000000000

edit: version used is latest commit (2708f71)

@mgiammarco
Copy link
Author

I am not disappearead. I try all version from ppa repository.
With the last version there are several improvements.
Unfortunately I get this:
[ 2125.140009] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.140023] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.140025] Call Trace:
[ 2125.140149] [] ? __alloc_pages_nodemask+0x5f4/0x830
[ 2125.140256] [] ? default_spin_lock_flags+0x9/0x10
[ 2125.140261] [] ? alloc_pages_current+0xa5/0x110
[ 2125.140314] [] ? new_slab+0x282/0x290
[ 2125.140317] [] ? __slab_alloc+0x262/0x390
[ 2125.140431] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.140435] [] ? kmem_cache_alloc+0xff/0x110
[ 2125.140437] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.140439] [] ?
scsi_host_alloc_command.clone.7+0x33/0xa0
[ 2125.140442] [] ? __scsi_get_command+0x2e/0xc0
[ 2125.140444] [] ? scsi_get_command+0x48/0xc0
[ 2125.140449] [] ? scsi_setup_fs_cmnd+0x8d/0xe0
[ 2125.140457] [] ? sd_prep_fn+0xaf/0xba0
[ 2125.140465] [] ? __wake_up+0x53/0x70
[ 2125.140534] [] ? blk_peek_request+0xce/0x220
[ 2125.140537] [] ? scsi_request_fn+0x66/0x4d0
[ 2125.140540] [] ? blk_unplug_work+0x0/0x70
[ 2125.140543] [] ? __generic_unplug_device+0x37/0x40
[ 2125.140546] [] ? generic_unplug_device+0x30/0x50
[ 2125.140554] [] ? blk_unplug_work+0x32/0x70
[ 2125.140572] [] ? process_one_work+0x11d/0x420
[ 2125.140576] [] ? worker_thread+0x169/0x360
[ 2125.140578] [] ? worker_thread+0x0/0x360
[ 2125.140581] [] ? kthread+0x96/0xa0
[ 2125.140587] [] ? kernel_thread_helper+0x4/0x10
[ 2125.140589] [] ? kthread+0x0/0xa0
[ 2125.140591] [] ? kernel_thread_helper+0x0/0x10
[ 2125.140593] Mem-Info:
[ 2125.140595] Node 0 DMA per-cpu:
[ 2125.140658] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.140660] Node 0 DMA32 per-cpu:
[ 2125.140662] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.140663] Node 0 Normal per-cpu:
[ 2125.140665] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.140668] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.140669] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.140670] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.140671] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.140671] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.140673] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.140685] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.140688] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.140696] lowmem_reserve[]: 0 0 2020 2020
[ 2125.140698] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.140706] lowmem_reserve[]: 0 0 0 0
[ 2125.140708] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB
0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.140716] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.140723] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.140733] 5578 total pagecache pages
[ 2125.140734] 0 pages in swap cache
[ 2125.140736] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.140737] Free swap = 0kB
[ 2125.140738] Total swap = 0kB
[ 2125.191753] 1310704 pages RAM
[ 2125.191756] 37445 pages reserved
[ 2125.191757] 17402 pages shared
[ 2125.191758] 1181257 pages non-shared
[ 2125.191762] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.191764] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.191767] node 0: slabs: 124, objs: 1984, free: 0
[ 2125.191885] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.191888] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.191890] Call Trace:
[ 2125.191892] [] ?
__alloc_pages_nodemask+0x5f4/0x830
[ 2125.192098] [] ? spin_unlock+0xe/0x20
[ 2125.192148] [] ? tcp_v4_rcv+0x659/0x900
[ 2125.192158] [] ? alloc_pages_current+0xa5/0x110
[ 2125.192161] [] ? new_slab+0x282/0x290
[ 2125.192219] [] ? unmap_single+0x34/0x70
[ 2125.192222] [] ? __slab_alloc+0x262/0x390
[ 2125.192368] [] ? __alloc_skb+0x4f/0x170
[ 2125.192371] [] ? kmem_cache_alloc_node+0x69/0x120
[ 2125.192376] [] ? enqueue_task_fair+0x61/0xa0
[ 2125.192378] [] ? __alloc_skb+0x4f/0x170
[ 2125.192380] [] ? __netdev_alloc_skb+0x24/0x50
[ 2125.192524] [] ? e1000_clean_rx_irq+0x2ad/0x4b0
[e1000]
[ 2125.192529] [] ? e1000_clean+0x51/0xc0 [e1000]
[ 2125.192532] [] ? net_rx_action+0x128/0x270
[ 2125.192540] [] ? __rcu_process_callbacks+0x99/0x180
[ 2125.192543] [] ? __do_softirq+0xa8/0x1c0
[ 2125.192551] [] ? tick_program_event+0x1f/0x30
[ 2125.192555] [] ? call_softirq+0x1c/0x30
[ 2125.192557] [] ? do_softirq+0x65/0xa0
[ 2125.192559] [] ? irq_exit+0x85/0x90
[ 2125.192567] [] ? smp_apic_timer_interrupt+0x70/0x9b
[ 2125.192570] [] ? apic_timer_interrupt+0x13/0x20
[ 2125.192572] [] ? scsi_request_fn+0x3b3/0x4d0
[ 2125.192579] [] ? blk_unplug_work+0x0/0x70
[ 2125.192582] [] ? __generic_unplug_device+0x37/0x40
[ 2125.192585] [] ? generic_unplug_device+0x30/0x50
[ 2125.192587] [] ? blk_unplug_work+0x32/0x70
[ 2125.192591] [] ? process_one_work+0x11d/0x420
[ 2125.192594] [] ? worker_thread+0x169/0x360
[ 2125.192597] [] ? worker_thread+0x0/0x360
[ 2125.192599] [] ? kthread+0x96/0xa0
[ 2125.192602] [] ? kernel_thread_helper+0x4/0x10
[ 2125.192604] [] ? kthread+0x0/0xa0
[ 2125.192607] [] ? kernel_thread_helper+0x0/0x10
[ 2125.192608] Mem-Info:
[ 2125.192610] Node 0 DMA per-cpu:
[ 2125.192612] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.192613] Node 0 DMA32 per-cpu:
[ 2125.192615] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.192616] Node 0 Normal per-cpu:
[ 2125.192618] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.192622] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.192622] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.192623] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.192624] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.192625] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.192626] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.192634] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.192637] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.192644] lowmem_reserve[]: 0 0 2020 2020
[ 2125.192647] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.192659] lowmem_reserve[]: 0 0 0 0
[ 2125.192661] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB
0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.192669] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.192675] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.192682] 5578 total pagecache pages
[ 2125.192683] 0 pages in swap cache
[ 2125.192685] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.192686] Free swap = 0kB
[ 2125.192687] Total swap = 0kB
[ 2125.212609] 1310704 pages RAM
[ 2125.212612] 37445 pages reserved
[ 2125.212613] 17402 pages shared
[ 2125.212614] 1181257 pages non-shared
[ 2125.212618] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.212620] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.212623] node 0: slabs: 124, objs: 1984, free: 0
[ 2125.212684] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.212688] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.212690] Call Trace:
[ 2125.212692] [] ?
__alloc_pages_nodemask+0x5f4/0x830
[ 2125.212705] [] ? try_to_wake_up+0x244/0x3e0
[ 2125.212709] [] ? alloc_pages_current+0xa5/0x110
[ 2125.212712] [] ? new_slab+0x282/0x290
[ 2125.212743] [] ? lapic_next_event+0x1d/0x30
[ 2125.212748] [] ? clockevents_program_event+0x67/0xa0
[ 2125.212750] [] ? __slab_alloc+0x262/0x390
[ 2125.212754] [] ? skb_clone+0x154/0x170
[ 2125.212757] [] ? tick_program_event+0x1f/0x30
[ 2125.212759] [] ? kmem_cache_alloc+0xff/0x110
[ 2125.212762] [] ? skb_clone+0x154/0x170
[ 2125.212936] [] ? packet_rcv_spkt+0x7d/0x1a0
[ 2125.212941] [] ? __netif_receive_skb+0x2e4/0x580
[ 2125.212944] [] ? netif_receive_skb+0x58/0x80
[ 2125.212947] [] ? napi_skb_finish+0x50/0x70
[ 2125.212949] [] ? napi_gro_receive+0xb5/0xc0
[ 2125.212963] [] ? e1000_receive_skb+0x5b/0x90 [e1000]
[ 2125.212967] [] ? e1000_clean_rx_irq+0x25e/0x4b0
[e1000]
[ 2125.212971] [] ? e1000_clean+0x51/0xc0 [e1000]
[ 2125.212974] [] ? net_rx_action+0x128/0x270
[ 2125.212977] [] ? __rcu_process_callbacks+0x99/0x180
[ 2125.212980] [] ? __do_softirq+0xa8/0x1c0
[ 2125.212984] [] ? tick_program_event+0x1f/0x30
[ 2125.212987] [] ? call_softirq+0x1c/0x30
[ 2125.212989] [] ? do_softirq+0x65/0xa0
[ 2125.212991] [] ? irq_exit+0x85/0x90
[ 2125.212994] [] ? smp_apic_timer_interrupt+0x70/0x9b
[ 2125.212997] [] ? apic_timer_interrupt+0x13/0x20
[ 2125.212998] [] ? scsi_request_fn+0x3b3/0x4d0
[ 2125.213007] [] ? blk_unplug_work+0x0/0x70
[ 2125.213010] [] ? __generic_unplug_device+0x37/0x40
[ 2125.213012] [] ? generic_unplug_device+0x30/0x50
[ 2125.213015] [] ? blk_unplug_work+0x32/0x70
[ 2125.213018] [] ? process_one_work+0x11d/0x420
[ 2125.213021] [] ? worker_thread+0x169/0x360
[ 2125.213024] [] ? worker_thread+0x0/0x360
[ 2125.213026] [] ? kthread+0x96/0xa0
[ 2125.213028] [] ? kernel_thread_helper+0x4/0x10
[ 2125.213040] [] ? kthread+0x0/0xa0
[ 2125.213042] [] ? kernel_thread_helper+0x0/0x10
[ 2125.213044] Mem-Info:
[ 2125.213045] Node 0 DMA per-cpu:
[ 2125.213048] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.213049] Node 0 DMA32 per-cpu:
[ 2125.213051] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.213052] Node 0 Normal per-cpu:
[ 2125.213054] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.213057] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.213058] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.213059] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.213059] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.213060] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.213062] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.213070] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.213073] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.213080] lowmem_reserve[]: 0 0 2020 2020
[ 2125.213082] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.213090] lowmem_reserve[]: 0 0 0 0
[ 2125.213093] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB
0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.213100] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.213107] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.213113] 5578 total pagecache pages
[ 2125.213116] 0 pages in swap cache
[ 2125.213117] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.213118] Free swap = 0kB
[ 2125.213119] Total swap = 0kB
[ 2125.234376] 1310704 pages RAM
[ 2125.234379] 37445 pages reserved
[ 2125.234380] 17402 pages shared
[ 2125.234381] 1181257 pages non-shared
[ 2125.234385] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.234388] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.234390] node 0: slabs: 124, objs: 1984, free: 0
[ 2125.234579] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.234582] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.234584] Call Trace:
[ 2125.234586] [] ?
__alloc_pages_nodemask+0x5f4/0x830
[ 2125.234596] [] ? alloc_pages_current+0xa5/0x110
[ 2125.234599] [] ? alloc_pages_current+0xa5/0x110
[ 2125.234602] [] ? new_slab+0x282/0x290
[ 2125.234605] [] ? __slab_alloc+0x262/0x390
[ 2125.234608] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.234665] [] ? kmem_cache_alloc+0xff/0x110
[ 2125.234669] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.234671] [] ?
scsi_host_alloc_command.clone.7+0x33/0xa0
[ 2125.234692] [] ? __scsi_get_command+0x2e/0xc0
[ 2125.234695] [] ? scsi_get_command+0x48/0xc0
[ 2125.234698] [] ? scsi_setup_fs_cmnd+0x8d/0xe0
[ 2125.234701] [] ? sd_prep_fn+0xaf/0xba0
[ 2125.234709] [] ? mptspi_qcmd+0x97/0x100 [mptspi]
[ 2125.234714] [] ? blk_peek_request+0xce/0x220
[ 2125.234763] [] ? scsi_request_fn+0x66/0x4d0
[ 2125.234766] [] ? __blk_run_queue+0x8e/0x170
[ 2125.234768] [] ? blk_run_queue+0x32/0x50
[ 2125.234770] [] ? scsi_run_queue+0xc8/0x2a0
[ 2125.234772] [] ? __scsi_put_command+0x67/0xa0
[ 2125.234775] [] ? scsi_next_command+0x42/0x60
[ 2125.234777] [] ? scsi_io_completion+0x252/0x590
[ 2125.234780] [] ? scsi_finish_command+0xc5/0x130
[ 2125.234782] [] ? scsi_softirq_done+0x147/0x170
[ 2125.234785] [] ? blk_done_softirq+0x85/0xa0
[ 2125.234788] [] ? __do_softirq+0xa8/0x1c0
[ 2125.234792] [] ? tick_program_event+0x1f/0x30
[ 2125.234795] [] ? call_softirq+0x1c/0x30
[ 2125.234797] [] ? do_softirq+0x65/0xa0
[ 2125.234799] [] ? irq_exit+0x85/0x90
[ 2125.234803] [] ? smp_apic_timer_interrupt+0x70/0x9b
[ 2125.234805] [] ? apic_timer_interrupt+0x13/0x20
[ 2125.234807] [] ? scsi_request_fn+0x3b3/0x4d0
[ 2125.234811] [] ? blk_unplug_work+0x0/0x70
[ 2125.234814] [] ? __generic_unplug_device+0x37/0x40
[ 2125.234817] [] ? generic_unplug_device+0x30/0x50
[ 2125.234819] [] ? blk_unplug_work+0x32/0x70
[ 2125.234822] [] ? process_one_work+0x11d/0x420
[ 2125.234825] [] ? worker_thread+0x169/0x360
[ 2125.234828] [] ? worker_thread+0x0/0x360
[ 2125.234830] [] ? kthread+0x96/0xa0
[ 2125.234833] [] ? kernel_thread_helper+0x4/0x10
[ 2125.234835] [] ? kthread+0x0/0xa0
[ 2125.234837] [] ? kernel_thread_helper+0x0/0x10
[ 2125.234838] Mem-Info:
[ 2125.234840] Node 0 DMA per-cpu:
[ 2125.234842] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.234843] Node 0 DMA32 per-cpu:
[ 2125.234845] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.234847] Node 0 Normal per-cpu:
[ 2125.234848] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.234852] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.234853] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.234853] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.234854] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.234855] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.234856] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.234864] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.234867] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.234874] lowmem_reserve[]: 0 0 2020 2020
[ 2125.234884] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.234891] lowmem_reserve[]: 0 0 0 0
[ 2125.234894] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB
0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.234901] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.234908] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.234914] 5578 total pagecache pages
[ 2125.234916] 0 pages in swap cache
[ 2125.234917] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.234918] Free swap = 0kB
[ 2125.234919] Total swap = 0kB
[ 2125.255600] 1310704 pages RAM
[ 2125.255603] 37445 pages reserved
[ 2125.255604] 17402 pages shared
[ 2125.255605] 1181257 pages non-shared
[ 2125.255609] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.255612] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.255614] node 0: slabs: 124, objs: 1984, free: 0
[ 2132.944060] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 2132.944284] IP: [] balance_pgdat+0x516/0x6e0
[ 2132.944513] PGD 13307d067 PUD 155da4067 PMD 0
[ 2132.944584] Oops: 0002 [#1] SMP
[ 2132.944633] last sysfs file:
/sys/devices/pci0000:00/0000:00:10.0/host2/target2:0:2/2:0:2:0/block/sdc/queue/scheduler
[ 2132.944781] CPU 0
[ 2132.944809] Modules linked in: zfs(P) zcommon(P) znvpair(P) zavl(P)
zunicode(P) spl zlib_deflate ppdev nfsd exportfs nfs lockd fscache nfs_acl
auth_rpcgss sunrpc psmouse vmw_balloon serio_raw parport_pc i2c_piix4 shpchp
lp parport mptspi floppy e1000 mptscsih mptbase
[ 2132.945198]
[ 2132.945220] Pid: 23, comm: kswapd0 Tainted: P 2.6.38-11-server
#50-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
Platform
[ 2132.945544] RIP: 0010:[] []
balance_pgdat+0x516/0x6e0
[ 2132.945664] RSP: 0018:ffff880174707d90 EFLAGS: 00010282
[ 2132.945737] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000bf6
[ 2132.945837] RDX: 0000000000000000 RSI: 00000000000000d0 RDI:
0000000000000ae0
[ 2132.945935] RBP: ffff880174707e80 R08: 0000000000000000 R09:
0000000000000100
[ 2132.946190] R10: 0000000000000004 R11: 0000000000000000 R12:
0000000000000000
[ 2132.946383] R13: ffff88017fffb000 R14: ffff88017fffb000 R15:
0000000000000002
[ 2132.946606] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000)
knlGS:0000000000000000
[ 2132.946894] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2132.947061] CR2: 0000000000000000 CR3: 0000000155da5000 CR4:
00000000000006f0
[ 2132.947284] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2132.947490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2132.947677] Process kswapd0 (pid: 23, threadinfo ffff880174706000, task
ffff88017470adc0)
[ 2132.947964] Stack:
[ 2132.948079] ffff880174707da0 ffff88017470adc0 ffff880174707dc0
ffff880174707e34
[ 2132.948474] ffff880174707eac 0000000000000000 0000000000001454
0000000000000000
[ 2132.948759] 0000000100000000 ffff880174707e40 0000000000000000
000000007fffb000
[ 2132.949048] Call Trace:
[ 2132.949173] [] kswapd+0xea/0x1d0
[ 2132.949329] [] ? kswapd+0x0/0x1d0
[ 2132.949486] [] kthread+0x96/0xa0
[ 2132.949642] [] kernel_thread_helper+0x4/0x10
[ 2132.949812] [] ? kthread+0x0/0xa0
[ 2132.949969] [] ? kernel_thread_helper+0x0/0x10
[ 2132.950157] Code: ea ff ff e9 82 fc ff ff 0f 1f 44 00 00 8b bd 74 ff ff
ff 48 8d 55 80 4c 89 f6 e8 06 ee ff ff 48 8b 95 38 ff ff ff be d0 00 00 00
<48> c7 02 00 00 00 00 48 8b 95 40 ff ff ff 48 8b 7d 80 e8 a3 ce
[ 2132.950800] RIP [] balance_pgdat+0x516/0x6e0
[ 2132.950972] RSP
[ 2132.951107] CR2: 0000000000000000
[ 2132.956029] ---[ end trace 674593f315e64846 ]---

With swapon it hangs the system. With swapoff it seems it continues to work.

@mgiammarco
Copy link
Author

---------- Forwarded message ----------
From: briceb83 <
[email protected]>
Date: 2011/9/11
Subject: Re: [zfs] Hang during rsync (#261)
To: mgiammarco [email protected]

Alphalead : I think your trick allowed my rsync session to last longer but
after a while it crashed again unfortunately

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.981661] Oops: 0002 [#1] SMP

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.981673] last sysfs file: /sys/module/mbcache/initstate

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982056] Stack:

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982133] Call Trace:

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982206] Code: 49 8b 14 04 48 c1 e2 03 e8 83 88 ff ff 85 c0
75 10 48 8d 54 24 70 48 89 de 44 89 ef e8 5b f3 ff ff 48 8b 54 24 50 be d0
00 00 00 <48> c7 02 00 00 00 00 48 8b 54 24 48 48 8b 7c 24 70 e8 7d f6 ff

Message from syslogd@stor01 at Sep 11 12:18:11 ...
kernel:[1714241.982346] CR2: 0000000000000000

Reply to this email directly or view it on GitHub:
#261 (comment)

@mgiammarco
Copy link
Author

I add that after several hours the system definitively hangs with this
(partial) message:

SLUB: Unable to allocate memory on node -1 (gfp=0x20)

2011/10/1 Mario Giammarco [email protected]

I am not disappearead. I try all version from ppa repository.
With the last version there are several improvements.
Unfortunately I get this:
[ 2125.140009] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.140023] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.140025] Call Trace:
[ 2125.140149] [] ? __alloc_pages_nodemask+0x5f4/0x830
[ 2125.140256] [] ? default_spin_lock_flags+0x9/0x10
[ 2125.140261] [] ? alloc_pages_current+0xa5/0x110
[ 2125.140314] [] ? new_slab+0x282/0x290
[ 2125.140317] [] ? __slab_alloc+0x262/0x390
[ 2125.140431] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.140435] [] ? kmem_cache_alloc+0xff/0x110
[ 2125.140437] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.140439] [] ?
scsi_host_alloc_command.clone.7+0x33/0xa0
[ 2125.140442] [] ? __scsi_get_command+0x2e/0xc0
[ 2125.140444] [] ? scsi_get_command+0x48/0xc0
[ 2125.140449] [] ? scsi_setup_fs_cmnd+0x8d/0xe0
[ 2125.140457] [] ? sd_prep_fn+0xaf/0xba0
[ 2125.140465] [] ? __wake_up+0x53/0x70
[ 2125.140534] [] ? blk_peek_request+0xce/0x220
[ 2125.140537] [] ? scsi_request_fn+0x66/0x4d0
[ 2125.140540] [] ? blk_unplug_work+0x0/0x70
[ 2125.140543] [] ? __generic_unplug_device+0x37/0x40
[ 2125.140546] [] ? generic_unplug_device+0x30/0x50
[ 2125.140554] [] ? blk_unplug_work+0x32/0x70
[ 2125.140572] [] ? process_one_work+0x11d/0x420
[ 2125.140576] [] ? worker_thread+0x169/0x360
[ 2125.140578] [] ? worker_thread+0x0/0x360
[ 2125.140581] [] ? kthread+0x96/0xa0
[ 2125.140587] [] ? kernel_thread_helper+0x4/0x10
[ 2125.140589] [] ? kthread+0x0/0xa0
[ 2125.140591] [] ? kernel_thread_helper+0x0/0x10
[ 2125.140593] Mem-Info:
[ 2125.140595] Node 0 DMA per-cpu:
[ 2125.140658] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.140660] Node 0 DMA32 per-cpu:
[ 2125.140662] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.140663] Node 0 Normal per-cpu:
[ 2125.140665] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.140668] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.140669] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.140670] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.140671] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.140671] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.140673] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.140685] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.140688] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.140696] lowmem_reserve[]: 0 0 2020 2020
[ 2125.140698] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.140706] lowmem_reserve[]: 0 0 0 0
[ 2125.140708] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB
1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.140716] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.140723] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.140733] 5578 total pagecache pages
[ 2125.140734] 0 pages in swap cache
[ 2125.140736] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.140737] Free swap = 0kB
[ 2125.140738] Total swap = 0kB
[ 2125.191753] 1310704 pages RAM
[ 2125.191756] 37445 pages reserved
[ 2125.191757] 17402 pages shared
[ 2125.191758] 1181257 pages non-shared
[ 2125.191762] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.191764] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.191767] node 0: slabs: 124, objs: 1984, free: 0
[ 2125.191885] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.191888] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.191890] Call Trace:
[ 2125.191892] [] ?
__alloc_pages_nodemask+0x5f4/0x830
[ 2125.192098] [] ? spin_unlock+0xe/0x20
[ 2125.192148] [] ? tcp_v4_rcv+0x659/0x900
[ 2125.192158] [] ? alloc_pages_current+0xa5/0x110
[ 2125.192161] [] ? new_slab+0x282/0x290
[ 2125.192219] [] ? unmap_single+0x34/0x70
[ 2125.192222] [] ? __slab_alloc+0x262/0x390
[ 2125.192368] [] ? __alloc_skb+0x4f/0x170
[ 2125.192371] [] ? kmem_cache_alloc_node+0x69/0x120
[ 2125.192376] [] ? enqueue_task_fair+0x61/0xa0
[ 2125.192378] [] ? __alloc_skb+0x4f/0x170
[ 2125.192380] [] ? __netdev_alloc_skb+0x24/0x50
[ 2125.192524] [] ? e1000_clean_rx_irq+0x2ad/0x4b0
[e1000]
[ 2125.192529] [] ? e1000_clean+0x51/0xc0 [e1000]
[ 2125.192532] [] ? net_rx_action+0x128/0x270
[ 2125.192540] [] ? __rcu_process_callbacks+0x99/0x180
[ 2125.192543] [] ? __do_softirq+0xa8/0x1c0
[ 2125.192551] [] ? tick_program_event+0x1f/0x30
[ 2125.192555] [] ? call_softirq+0x1c/0x30
[ 2125.192557] [] ? do_softirq+0x65/0xa0
[ 2125.192559] [] ? irq_exit+0x85/0x90
[ 2125.192567] [] ? smp_apic_timer_interrupt+0x70/0x9b
[ 2125.192570] [] ? apic_timer_interrupt+0x13/0x20
[ 2125.192572] [] ? scsi_request_fn+0x3b3/0x4d0
[ 2125.192579] [] ? blk_unplug_work+0x0/0x70
[ 2125.192582] [] ? __generic_unplug_device+0x37/0x40
[ 2125.192585] [] ? generic_unplug_device+0x30/0x50
[ 2125.192587] [] ? blk_unplug_work+0x32/0x70
[ 2125.192591] [] ? process_one_work+0x11d/0x420
[ 2125.192594] [] ? worker_thread+0x169/0x360
[ 2125.192597] [] ? worker_thread+0x0/0x360
[ 2125.192599] [] ? kthread+0x96/0xa0
[ 2125.192602] [] ? kernel_thread_helper+0x4/0x10
[ 2125.192604] [] ? kthread+0x0/0xa0
[ 2125.192607] [] ? kernel_thread_helper+0x0/0x10
[ 2125.192608] Mem-Info:
[ 2125.192610] Node 0 DMA per-cpu:
[ 2125.192612] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.192613] Node 0 DMA32 per-cpu:
[ 2125.192615] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.192616] Node 0 Normal per-cpu:
[ 2125.192618] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.192622] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.192622] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.192623] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.192624] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.192625] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.192626] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.192634] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.192637] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.192644] lowmem_reserve[]: 0 0 2020 2020
[ 2125.192647] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.192659] lowmem_reserve[]: 0 0 0 0
[ 2125.192661] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB
1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.192669] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.192675] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.192682] 5578 total pagecache pages
[ 2125.192683] 0 pages in swap cache
[ 2125.192685] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.192686] Free swap = 0kB
[ 2125.192687] Total swap = 0kB
[ 2125.212609] 1310704 pages RAM
[ 2125.212612] 37445 pages reserved
[ 2125.212613] 17402 pages shared
[ 2125.212614] 1181257 pages non-shared
[ 2125.212618] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.212620] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.212623] node 0: slabs: 124, objs: 1984, free: 0
[ 2125.212684] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.212688] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.212690] Call Trace:
[ 2125.212692] [] ?
__alloc_pages_nodemask+0x5f4/0x830
[ 2125.212705] [] ? try_to_wake_up+0x244/0x3e0
[ 2125.212709] [] ? alloc_pages_current+0xa5/0x110
[ 2125.212712] [] ? new_slab+0x282/0x290
[ 2125.212743] [] ? lapic_next_event+0x1d/0x30
[ 2125.212748] [] ? clockevents_program_event+0x67/0xa0
[ 2125.212750] [] ? __slab_alloc+0x262/0x390
[ 2125.212754] [] ? skb_clone+0x154/0x170
[ 2125.212757] [] ? tick_program_event+0x1f/0x30
[ 2125.212759] [] ? kmem_cache_alloc+0xff/0x110
[ 2125.212762] [] ? skb_clone+0x154/0x170
[ 2125.212936] [] ? packet_rcv_spkt+0x7d/0x1a0
[ 2125.212941] [] ? __netif_receive_skb+0x2e4/0x580
[ 2125.212944] [] ? netif_receive_skb+0x58/0x80
[ 2125.212947] [] ? napi_skb_finish+0x50/0x70
[ 2125.212949] [] ? napi_gro_receive+0xb5/0xc0
[ 2125.212963] [] ? e1000_receive_skb+0x5b/0x90 [e1000]
[ 2125.212967] [] ? e1000_clean_rx_irq+0x25e/0x4b0
[e1000]
[ 2125.212971] [] ? e1000_clean+0x51/0xc0 [e1000]
[ 2125.212974] [] ? net_rx_action+0x128/0x270
[ 2125.212977] [] ? __rcu_process_callbacks+0x99/0x180
[ 2125.212980] [] ? __do_softirq+0xa8/0x1c0
[ 2125.212984] [] ? tick_program_event+0x1f/0x30
[ 2125.212987] [] ? call_softirq+0x1c/0x30
[ 2125.212989] [] ? do_softirq+0x65/0xa0
[ 2125.212991] [] ? irq_exit+0x85/0x90
[ 2125.212994] [] ? smp_apic_timer_interrupt+0x70/0x9b
[ 2125.212997] [] ? apic_timer_interrupt+0x13/0x20
[ 2125.212998] [] ? scsi_request_fn+0x3b3/0x4d0
[ 2125.213007] [] ? blk_unplug_work+0x0/0x70
[ 2125.213010] [] ? __generic_unplug_device+0x37/0x40
[ 2125.213012] [] ? generic_unplug_device+0x30/0x50
[ 2125.213015] [] ? blk_unplug_work+0x32/0x70
[ 2125.213018] [] ? process_one_work+0x11d/0x420
[ 2125.213021] [] ? worker_thread+0x169/0x360
[ 2125.213024] [] ? worker_thread+0x0/0x360
[ 2125.213026] [] ? kthread+0x96/0xa0
[ 2125.213028] [] ? kernel_thread_helper+0x4/0x10
[ 2125.213040] [] ? kthread+0x0/0xa0
[ 2125.213042] [] ? kernel_thread_helper+0x0/0x10
[ 2125.213044] Mem-Info:
[ 2125.213045] Node 0 DMA per-cpu:
[ 2125.213048] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.213049] Node 0 DMA32 per-cpu:
[ 2125.213051] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.213052] Node 0 Normal per-cpu:
[ 2125.213054] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.213057] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.213058] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.213059] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.213059] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.213060] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.213062] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.213070] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.213073] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.213080] lowmem_reserve[]: 0 0 2020 2020
[ 2125.213082] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.213090] lowmem_reserve[]: 0 0 0 0
[ 2125.213093] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB
1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.213100] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.213107] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.213113] 5578 total pagecache pages
[ 2125.213116] 0 pages in swap cache
[ 2125.213117] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.213118] Free swap = 0kB
[ 2125.213119] Total swap = 0kB
[ 2125.234376] 1310704 pages RAM
[ 2125.234379] 37445 pages reserved
[ 2125.234380] 17402 pages shared
[ 2125.234381] 1181257 pages non-shared
[ 2125.234385] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.234388] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.234390] node 0: slabs: 124, objs: 1984, free: 0
[ 2125.234579] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.234582] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.234584] Call Trace:
[ 2125.234586] [] ?
__alloc_pages_nodemask+0x5f4/0x830
[ 2125.234596] [] ? alloc_pages_current+0xa5/0x110
[ 2125.234599] [] ? alloc_pages_current+0xa5/0x110
[ 2125.234602] [] ? new_slab+0x282/0x290
[ 2125.234605] [] ? __slab_alloc+0x262/0x390
[ 2125.234608] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.234665] [] ? kmem_cache_alloc+0xff/0x110
[ 2125.234669] [] ? scsi_pool_alloc_command+0x33/0x80
[ 2125.234671] [] ?
scsi_host_alloc_command.clone.7+0x33/0xa0
[ 2125.234692] [] ? __scsi_get_command+0x2e/0xc0
[ 2125.234695] [] ? scsi_get_command+0x48/0xc0
[ 2125.234698] [] ? scsi_setup_fs_cmnd+0x8d/0xe0
[ 2125.234701] [] ? sd_prep_fn+0xaf/0xba0
[ 2125.234709] [] ? mptspi_qcmd+0x97/0x100 [mptspi]
[ 2125.234714] [] ? blk_peek_request+0xce/0x220
[ 2125.234763] [] ? scsi_request_fn+0x66/0x4d0
[ 2125.234766] [] ? __blk_run_queue+0x8e/0x170
[ 2125.234768] [] ? blk_run_queue+0x32/0x50
[ 2125.234770] [] ? scsi_run_queue+0xc8/0x2a0
[ 2125.234772] [] ? __scsi_put_command+0x67/0xa0
[ 2125.234775] [] ? scsi_next_command+0x42/0x60
[ 2125.234777] [] ? scsi_io_completion+0x252/0x590
[ 2125.234780] [] ? scsi_finish_command+0xc5/0x130
[ 2125.234782] [] ? scsi_softirq_done+0x147/0x170
[ 2125.234785] [] ? blk_done_softirq+0x85/0xa0
[ 2125.234788] [] ? __do_softirq+0xa8/0x1c0
[ 2125.234792] [] ? tick_program_event+0x1f/0x30
[ 2125.234795] [] ? call_softirq+0x1c/0x30
[ 2125.234797] [] ? do_softirq+0x65/0xa0
[ 2125.234799] [] ? irq_exit+0x85/0x90
[ 2125.234803] [] ? smp_apic_timer_interrupt+0x70/0x9b
[ 2125.234805] [] ? apic_timer_interrupt+0x13/0x20
[ 2125.234807] [] ? scsi_request_fn+0x3b3/0x4d0
[ 2125.234811] [] ? blk_unplug_work+0x0/0x70
[ 2125.234814] [] ? __generic_unplug_device+0x37/0x40
[ 2125.234817] [] ? generic_unplug_device+0x30/0x50
[ 2125.234819] [] ? blk_unplug_work+0x32/0x70
[ 2125.234822] [] ? process_one_work+0x11d/0x420
[ 2125.234825] [] ? worker_thread+0x169/0x360
[ 2125.234828] [] ? worker_thread+0x0/0x360
[ 2125.234830] [] ? kthread+0x96/0xa0
[ 2125.234833] [] ? kernel_thread_helper+0x4/0x10
[ 2125.234835] [] ? kthread+0x0/0xa0
[ 2125.234837] [] ? kernel_thread_helper+0x0/0x10
[ 2125.234838] Mem-Info:
[ 2125.234840] Node 0 DMA per-cpu:
[ 2125.234842] CPU 0: hi: 0, btch: 1 usd: 0
[ 2125.234843] Node 0 DMA32 per-cpu:
[ 2125.234845] CPU 0: hi: 186, btch: 31 usd: 140
[ 2125.234847] Node 0 Normal per-cpu:
[ 2125.234848] CPU 0: hi: 186, btch: 31 usd: 48
[ 2125.234852] active_anon:39107 inactive_anon:120 isolated_anon:0
[ 2125.234853] active_file:3002 inactive_file:2430 isolated_file:0
[ 2125.234853] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2125.234854] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421
[ 2125.234855] mapped:2900 shmem:146 pagetables:1273 bounce:0
[ 2125.234856] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[ 2125.234864] lowmem_reserve[]: 0 3000 5020 5020
[ 2125.234867] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB
active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB
slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB
pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2125.234874] lowmem_reserve[]: 0 0 2020 2020
[ 2125.234884] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB
active_anon:148464kB inactive_anon:480kB active_file:12008kB
inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB
shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB
kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2125.234891] lowmem_reserve[]: 0 0 0 0
[ 2125.234894] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB
1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB
[ 2125.234901] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB
[ 2125.234908] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB
0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB
[ 2125.234914] 5578 total pagecache pages
[ 2125.234916] 0 pages in swap cache
[ 2125.234917] Swap cache stats: add 0, delete 0, find 0/0
[ 2125.234918] Free swap = 0kB
[ 2125.234919] Total swap = 0kB
[ 2125.255600] 1310704 pages RAM
[ 2125.255603] 37445 pages reserved
[ 2125.255604] 17402 pages shared
[ 2125.255605] 1181257 pages non-shared
[ 2125.255609] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2125.255612] cache: kmalloc-256, object size: 256, buffer size: 256,
default order: 0, min order: 0
[ 2125.255614] node 0: slabs: 124, objs: 1984, free: 0
[ 2132.944060] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 2132.944284] IP: [] balance_pgdat+0x516/0x6e0
[ 2132.944513] PGD 13307d067 PUD 155da4067 PMD 0
[ 2132.944584] Oops: 0002 [#1] SMP
[ 2132.944633] last sysfs file:
/sys/devices/pci0000:00/0000:00:10.0/host2/target2:0:2/2:0:2:0/block/sdc/queue/scheduler
[ 2132.944781] CPU 0
[ 2132.944809] Modules linked in: zfs(P) zcommon(P) znvpair(P) zavl(P)
zunicode(P) spl zlib_deflate ppdev nfsd exportfs nfs lockd fscache nfs_acl
auth_rpcgss sunrpc psmouse vmw_balloon serio_raw parport_pc i2c_piix4 shpchp
lp parport mptspi floppy e1000 mptscsih mptbase
[ 2132.945198]
[ 2132.945220] Pid: 23, comm: kswapd0 Tainted: P
2.6.38-11-server #50-Ubuntu VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform
[ 2132.945544] RIP: 0010:[] []
balance_pgdat+0x516/0x6e0
[ 2132.945664] RSP: 0018:ffff880174707d90 EFLAGS: 00010282
[ 2132.945737] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000bf6
[ 2132.945837] RDX: 0000000000000000 RSI: 00000000000000d0 RDI:
0000000000000ae0
[ 2132.945935] RBP: ffff880174707e80 R08: 0000000000000000 R09:
0000000000000100
[ 2132.946190] R10: 0000000000000004 R11: 0000000000000000 R12:
0000000000000000
[ 2132.946383] R13: ffff88017fffb000 R14: ffff88017fffb000 R15:
0000000000000002
[ 2132.946606] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000)
knlGS:0000000000000000
[ 2132.946894] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2132.947061] CR2: 0000000000000000 CR3: 0000000155da5000 CR4:
00000000000006f0
[ 2132.947284] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2132.947490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2132.947677] Process kswapd0 (pid: 23, threadinfo ffff880174706000, task
ffff88017470adc0)
[ 2132.947964] Stack:
[ 2132.948079] ffff880174707da0 ffff88017470adc0 ffff880174707dc0
ffff880174707e34
[ 2132.948474] ffff880174707eac 0000000000000000 0000000000001454
0000000000000000
[ 2132.948759] 0000000100000000 ffff880174707e40 0000000000000000
000000007fffb000
[ 2132.949048] Call Trace:
[ 2132.949173] [] kswapd+0xea/0x1d0
[ 2132.949329] [] ? kswapd+0x0/0x1d0
[ 2132.949486] [] kthread+0x96/0xa0
[ 2132.949642] [] kernel_thread_helper+0x4/0x10
[ 2132.949812] [] ? kthread+0x0/0xa0
[ 2132.949969] [] ? kernel_thread_helper+0x0/0x10
[ 2132.950157] Code: ea ff ff e9 82 fc ff ff 0f 1f 44 00 00 8b bd 74 ff ff
ff 48 8d 55 80 4c 89 f6 e8 06 ee ff ff 48 8b 95 38 ff ff ff be d0 00 00 00
<48> c7 02 00 00 00 00 48 8b 95 40 ff ff ff 48 8b 7d 80 e8 a3 ce
[ 2132.950800] RIP [] balance_pgdat+0x516/0x6e0
[ 2132.950972] RSP
[ 2132.951107] CR2: 0000000000000000
[ 2132.956029] ---[ end trace 674593f315e64846 ]---

With swapon it hangs the system. With swapoff it seems it continues to
work.

@behlendorf
Copy link
Contributor

This is very likely caused by issue #287 which we are testing a fix for now.

@tstudios
Copy link

tstudios commented Nov 3, 2011

Excellent! Looking forward to both issues falling off the bug list.

Sam Allgood
Turner Studios
Atlanta, Georgia 30318

-----Original Message-----
From: Brian Behlendorf [mailto:[email protected]]
Sent: Wed 11/2/2011 4:42 PM
To: Allgood, Sam
Subject: Re: [zfs] Hang during rsync (#261)

This is very likely caused by issue #287 which we are testing a fix for now.

Reply to this email directly or view it on GitHub:
#261 (comment)

@behlendorf
Copy link
Contributor

Closing issue, this should be fixed by the following two commits for #287.

6a95d0b
openzfs/spl@b8b6e4c

@ioquatix
Copy link

I seem to be having this issue. I was doing a moderately large Rsync, and had the timeouts described above.

Feb 14 03:00:53 hinoki kernel: INFO: task kswapd0:26 blocked for more than 120 seconds.
Feb 14 03:00:53 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:54 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:54 hinoki kernel: kswapd0         D 0000000000000246     0    26      2 0x00000000
Feb 14 03:00:54 hinoki kernel:  ffff880223e47a90 0000000000000046 00000000000144c0 ffff880223e47fd8
Feb 14 03:00:55 hinoki kernel:  ffff880223e47fd8 00000000000144c0 ffff8802241622c0 ffff880223e479f8
Feb 14 03:00:55 hinoki kernel:  ffffffffa0304896 ffff8801185c8830 ffff8802082d5a00 ffff880053f7aa80
Feb 14 03:00:55 hinoki kernel: Call Trace:
Feb 14 03:00:56 hinoki kernel:  [<ffffffffa0304896>] ? dnode_setdirty+0x126/0x170 [zfs]
Feb 14 03:00:56 hinoki kernel:  [<ffffffffa02ed7ed>] ? dbuf_dirty+0x44d/0x920 [zfs]
Feb 14 03:00:56 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:56 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:57 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:57 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:57 hinoki kernel:  [<ffffffffa037fded>] zfs_zinactive+0x6d/0x150 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0378cd4>] zfs_inactive+0x64/0x210 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811452f9>] ? truncate_pagecache+0x59/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa038e924>] zpl_evict_inode+0x24/0x30 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811befe0>] evict+0xb0/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811bf119>] dispose_list+0x39/0x50
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811c00d7>] prune_icache_sb+0x47/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811a839c>] super_cache_scan+0xfc/0x170
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811478ef>] shrink_slab+0x1df/0x3e0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b5a6>] balance_pgdat+0x3d6/0x5f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b929>] kswapd+0x169/0x480
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b7c0>] ? balance_pgdat+0x5f0/0x5f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120

Is this relevant or is there another bug I should checking?

Thanks,
Samuel

@behlendorf
Copy link
Contributor

@ioquatix This looks different. If you're seeing this in the master code go ahead and open a new issue.

fuhrmannb pushed a commit to fuhrmannb/cstor that referenced this issue Nov 3, 2020
sdimitro pushed a commit to sdimitro/zfs that referenced this issue May 4, 2021
…ternal error - ZFS_ERR_STREAM_TRUNCATED (openzfs#261)

Signed-off-by: Paul Dagnelie <[email protected]>
andrewc12 added a commit to andrewc12/openzfs that referenced this issue Aug 16, 2023
EchterAgo pushed a commit to EchterAgo/zfs that referenced this issue Sep 21, 2023
ixhamza pushed a commit to ixhamza/zfs that referenced this issue Dec 16, 2024
NAS-130427 / None / Linux 6.12 compat: META
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants