-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS hang on tgx_sync #542
Comments
It happens on Ubuntu Oneiric x64.
|
The stack is just advisory, are you observing any other issues? |
I'm having the same thing. Other issues that arise from this is whatever the currently running process (mv, cp, dd, etc ...) all go into a 'uninterruptible' state (as well as txg_sync) and hang. What other info can I provide to help track this down?
(yes I name my computers on my home network after Mad Men characters) |
Maybe it could be related something with this: I've just upgraded the system to a ZFS version containing this one and from that time this problem has not occured yet. Nothing is sure yet. I'll keep watching it. |
@lkishalmi If I use the daily build PPA this fix should be in there? |
I think it does. I have v0.6.0.48 on that server. It has now more than 3 On 2012-01-29 21:21, Ryan Lovelett wrote:
|
Ok thanks for the info I will give it a try. |
Just to give back-ground the hardware is: http://reviews.cnet.com/soho-servers/acer-aspire-easystore-h340/4507-3125_7-33707300.html?tag=mncol;subnav The OS was given above. Here are some sample logs from
|
Is this issue still being observed with that latest PPA? There have been several fixes which may address this. |
I can´t say it for sure. We experienced this problem only once since upgrading to 0.6.0.48 |
Alright, then if you don't mind I'd like to close this issue. Please feel free to open a new issue if you ever observe this or any other failure. We absolutely want to get them all resolved. |
I believe I am experiencing this issue. I am transferring a large amount of data off a zfs raidz2 (~1.5 TB), and after several hours I see these errors and then the system locks up. I am running the latest stable ppa. Here is the output from my kernel.log Mar 26 00:24:34 oxford kernel: [24241.244476] INFO: task zfs:21947 blocked for more than 120 second
s.
Mar 26 00:24:34 oxford kernel: [24241.244500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di
sables this message.
Mar 26 00:24:34 oxford kernel: [24241.244528] zfs D ffffffff81805120 0 21947 21946
0x00000000
Mar 26 00:24:34 oxford kernel: [24241.244532] ffff8800a534fc38 0000000000000086 0000000000000000 0
000000000000000
Mar 26 00:24:34 oxford kernel: [24241.244535] ffff8800a534ffd8 ffff8800a534ffd8 ffff8800a534ffd8 0
000000000012a40
Mar 26 00:24:34 oxford kernel: [24241.244539] ffff880232d28000 ffff88022897c560 ffff8800a534fc48 f
fff880229d59268
Mar 26 00:24:34 oxford kernel: [24241.244542] Call Trace:
Mar 26 00:24:34 oxford kernel: [24241.244544] [<ffffffff816058cf>] schedule+0x3f/0x60
Mar 26 00:24:34 oxford kernel: [24241.244551] [<ffffffffa01aa6b8>] cv_wait_common+0x98/0x190 [spl]
Mar 26 00:24:34 oxford kernel: [24241.244554] [<ffffffff810466f3>] ? __wake_up+0x53/0x70
Mar 26 00:24:34 oxford kernel: [24241.244557] [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 00:24:34 oxford kernel: [24241.244563] [<ffffffffa01aa7e3>] __cv_wait+0x13/0x20 [spl]
Mar 26 00:24:34 oxford kernel: [24241.244585] [<ffffffffa029b303>] txg_wait_synced+0xb3/0x190 [zfs
]
Mar 26 00:24:34 oxford kernel: [24241.244606] [<ffffffffa027fe6e>] dsl_sync_task_group_wait+0x14e/
0x270 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244624] [<ffffffffa026bc60>] ? snaplist_destroy+0x100/0x100
[zfs]
Mar 26 00:24:34 oxford kernel: [24241.244642] [<ffffffffa026f4a0>] ? dsl_dataset_create_sync+0x280
/0x280 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244662] [<ffffffffa0280167>] dsl_sync_task_do+0x57/0x80 [zfs
]
Mar 26 00:24:34 oxford kernel: [24241.244681] [<ffffffffa0271302>] dsl_dataset_destroy+0xb2/0x490
[zfs]
Mar 26 00:24:34 oxford kernel: [24241.244698] [<ffffffffa025980b>] dmu_objset_destroy+0x3b/0x50 [z
fs]
Mar 26 00:24:34 oxford kernel: [24241.244719] [<ffffffffa02bca23>] zfs_ioc_destroy+0x23/0x60 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244741] [<ffffffffa02c087c>] zfsdev_ioctl+0xdc/0x1b0 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244744] [<ffffffff8117a1a9>] do_vfs_ioctl+0x89/0x310
Mar 26 00:24:34 oxford kernel: [24241.244748] [<ffffffff81134c73>] ? do_munmap+0x1f3/0x2f0
Mar 26 00:24:34 oxford kernel: [24241.244750] [<ffffffff8117a4c1>] sys_ioctl+0x91/0xa0
Mar 26 00:24:34 oxford kernel: [24241.244754] [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b This is followed by: Mar 26 00:26:34 oxford kernel: [24361.244105] INFO: task txg_sync:3273 blocked for more than 120 se
conds.
Mar 26 00:26:34 oxford kernel: [24361.244165] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di
sables this message.
Mar 26 00:26:34 oxford kernel: [24361.244225] txg_sync D 0000000000000000 0 3273 0
x0000000000
Mar 26 00:26:34 oxford kernel: [24361.244233] ffff88020e579b90 0000000000000046 ffff88020e579b30 f
fffffff810574c2
Mar 26 00:26:34 oxford kernel: [24361.244241] ffff88020e579fd8 ffff88020e579fd8 ffff88020e579fd8 0
000000000012a40
Mar 26 00:26:34 oxford kernel: [24361.244248] ffff88022bbe9720 ffff88020fdc2e40 ffff88020e579ba0 f
fff8800594922e8
Mar 26 00:26:34 oxford kernel: [24361.244255] Call Trace:
Mar 26 00:26:34 oxford kernel: [24361.244266] [<ffffffff810574c2>] ? default_wake_function+0x12/0x
20
Mar 26 00:26:34 oxford kernel: [24361.244274] [<ffffffff816058cf>] schedule+0x3f/0x60
Mar 26 00:26:34 oxford kernel: [24361.244302] [<ffffffffa01aa6b8>] cv_wait_common+0x98/0x190 [spl]
Mar 26 00:26:34 oxford kernel: [24361.244309] [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 00:26:34 oxford kernel: [24361.244323] [<ffffffffa01aa7e3>] __cv_wait+0x13/0x20 [spl]
Mar 26 00:26:34 oxford kernel: [24361.244383] [<ffffffffa02dcc9b>] zio_wait+0xfb/0x170 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244427] [<ffffffffa027f5e1>] dsl_scan_sync+0x481/0x8e0 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244433] [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 00:26:34 oxford kernel: [24361.244479] [<ffffffffa028bfa1>] spa_sync+0x3f1/0xa00 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244526] [<ffffffffa029c006>] txg_sync_thread+0x286/0x450 [zf
s]
Mar 26 00:26:34 oxford kernel: [24361.244573] [<ffffffffa029bd80>] ? txg_init+0x260/0x260 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244586] [<ffffffffa01a2f88>] thread_generic_wrapper+0x78/0x9
0 [spl]
Mar 26 00:26:34 oxford kernel: [24361.244599] [<ffffffffa01a2f10>] ? __thread_create+0x310/0x310 [
spl]
Mar 26 00:26:34 oxford kernel: [24361.244604] [<ffffffff81080dac>] kthread+0x8c/0xa0
Mar 26 00:26:34 oxford kernel: [24361.244610] [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
Mar 26 00:26:34 oxford kernel: [24361.244615] [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa
0
Mar 26 00:26:34 oxford kernel: [24361.244620] [<ffffffff81610ca0>] ? gs_change+0x13/0x13 This was followed by the machine locking up (kernel panic?). These errors actually occurred several times in the kernel log before the lockup. Here is more information about my system and version: ryan@oxford:$ modinfo zfs
filename: /lib/modules/3.0.0-16-server/updates/dkms/zfs.ko
license: CDDL
author: Sun Microsystems/Oracle, Lawrence Livermore National Laboratory
description: ZFS
srcversion: 4F965FF03CFCE90153F3318
depends: spl,znvpair,zcommon,zunicode,zavl
vermagic: 3.0.0-16-server SMP mod_unload modversions and ryan@oxford:$ sudo dpkg -l | grep -E "zfs-|spl-"
ii spl-dkms 0.6.0.54-0ubuntu1~oneiric1 Solaris Porting Layer kernel modules for Linux
ii zfs-dkms 0.6.0.54-1ubuntu1~oneiric1 Native ZFS filesystem kernel modules for Linux |
Maybe related to issue #610 Got some of those too (54 and 55)
Was fine in ppa-daily 51 |
Got it with 51 as well now. Freeing some space on the zpool seems to have unstuck the situation though. |
Getting similar traces here as well, with spl@8920c69, behlendorf/zfs@525c13f, linux-3.3.0 The system seems to have come back to normal after these.
|
I'm no longer seeing this every night in 0.6.0.56 ppa release, but I also am no longer pushing the zfs system as hard (i.e. no longer snapshotting while copying large amounts of data). Also, I had one event where I got a bunch of these messages and the system did not lock up, so I cannot be certain these messages lead to a crash, or if something else caused my system to crash. |
I just found out that my RAM had some bad areas, which was likely leading to corruption (especially when the RAM was being used heavily... as during large zfs operations). So far so good after fixing the RAM and doing a clean install. The crashes I was experiencing were likely not the (sole) fault of zfs. |
Thanks for the update, I'm glad to hear you found at least one of the causes. |
I'm having this problem on a freshly installed ubuntu 12.x (precise) on a system with ECC ram. The disks came out of the same box when it was previously running FreeNAS. I tried the stable branch, now trying the dailies to see if that resolves the issue. It may be worth noting that reads were fine, it was only when I tried to write to the filesystem did I encounter the problem. |
I have strange issue similar to above. System: I have mirrored zpool (2x2TB) I tried to destroy dataset with 2 snapshots by using command: zfs destroy -r pool1/datasetLoad on server achieved values above 50 and takes a lot of time. Console was freezed more of time. dmesg top uptime iostat avg-cpu: %user %nice %system %iowait %steal %idle Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn Here u can see only one disk because I removed one for safety. For two disks behaviour is the same. |
@marcinkuk Ths sync thread being stuck is a symptom of a deadlock somewhere else and that should merit a new issue. |
@marcinkuk If you're still able to recreate this issue please open a new issue for it. It will get lost if it's just appended to the end of this closed bug. |
…aster Merge remote-tracking branch '6.0/stage' into 'master'
We try to use ZFS on our build machine. The machine is a VMWare virtual machine on esx server.
The text was updated successfully, but these errors were encountered: