-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs destroy (a snapshot) SOMETIMES fails silently #1007
Comments
Links replaced with permalinks. (sorry for that mistake) |
Does this still happen in latest HEAD? |
Richard Yao:
UbuntuNewbie (wanting to say: i am no developer) |
Please try the daily PPA. |
Richard Yao:
But dont expect too much, with rc11, i observed 1 occurrence in 2 days. U.N. |
Richard Yao:
Before rebooting i witnessed some irregular behavior, sometimes zfs But system stability has improved and zfs RAM usage is diminished, |
I wrote: That was a mistake. There are messages complaining, if an attempt to destroy a non-existant snapshot is made. Still the phenomenon reappears at an alarming rate (at least once in 2 days, at times repeatedly during one day). This happens so frequently, that i rely on the following cleanup script after boot:
|
Hi, for me (Debian testing, rc11), the below sequence reproduces this 100% of the time: root@thinkpad: |
As another data point, I saw this behaviour on OpenIndiana a few days ago. I don't know if this was before or after I applied updates. A reboot fixed it and I haven't reproduced it there since. I haven't tried the steps @jvsalo just left. |
It appears lingering snapshot mounts (/proc/mounts) may be related. When a snapshot is accessed, a new mount is created: rpool/testfs@testsnap /mnt/.zfs/snapshot/testsnap zfs ro,relatime,xattr 0 0 However, even when all file descriptors (according to lsof) have been closed, the mount might not disappear and destroying the snapshot once causes the mount to disappear, and a second destroy actually destroys the snapshot. The snapshot is accessible and would be re-mounted if doing so after the first destroy. The above applies if I enter commands by hand (so there is delay), but if I run a script like this, things get even worse: #!/bin/sh zfs create rpool/testfs mount -t zfs rpool/testfs /mnt zfs snapshot rpool/testfs@testsnap cd /mnt/.zfs/snapshot/testsnap/ cd - zfs destroy -v rpool/testfs@testsnap zfs list -t all -r rpool/testfs zfs destroy -v rpool/testfs@testsnap zfs list -t all -r rpool/testfs umount /mnt zfs destroy -v -r rpool/testfs I would get root@thinkpad:/tmp# sh reprod.sh /tmp will destroy rpool/testfs@testsnap will reclaim 0 NAME USED AVAIL REFER MOUNTPOINT rpool/testfs 30K 64.2G 30K legacy rpool/testfs@testsnap 0 - 30K - will destroy rpool/testfs@testsnap will reclaim 0 NAME USED AVAIL REFER MOUNTPOINT rpool/testfs 30K 64.2G 30K legacy rpool/testfs@testsnap 0 - 30K - umount: /mnt: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) will destroy rpool/testfs@testsnap cannot destroy 'rpool/testfs@testsnap': dataset is busy ... and rpool/testfs@testsnap is invulnerable to destroy unless I manually unmount /mnt/.zfs/snapshot/testsnap first. After that destroy works as expected. |
Interesting theory there. Only i cannot confirm my case would ever be In any case: i have no reproducer, one time it appeared over night on a So in any case - if the issue mentioned would get resolved - it might Am 21.10.2012 18:07, schrieb Jaakko:
|
However, I can only reproduce this on my 3.5.x hosts:
On all hosts ZFS is controlling disk devices directly. Laptop has SSD, others rotating storage. @UbuntuNewbie: yeah, could be a different issue, I'm opening a new ticket for my reproducer. |
Additional information, confirming the suspicion, that the observed behavior is a weird side-effect of another problem: Ever since this is going on (at least since zfs-auto-snap creates a lot of snapshots), i am running gnome-system-monitor among with automated scripts to catch this issue as early as possible. And today, i could see something (ok, the very outer edge of it) that i cannot explain. on a system almost idle, only a handful of non-greedy tasks running, all of a sudden, i saw a constant rise of memory consumption, taking up gigabytes of free memory. At the moment, there was no more left, ALL processors started to get extremely busy (100%), leaving a system non responsive for some time. After while, the system came back to being responsive, which i used to shut down most of the tasks cleanly. then i checked the status of the snaps and THERE IT WAS: some snapshots did not get destroyed and also could not by hand, - until the next reboot. So it looks like the extreme memory pressure - and the way zfs created and handled it - left some corruption behind, that - at least up till now - required a reboot to get resolved. I understand this comment lacks the precision a developer might be looking for, only a general outline. But since i never witnessed this building up, i thought it might be useful to someone... |
The theory of the snapshot folder being mounted on access is correct. I manually went in and unmounted the snapshot specific folder in the .zfs folder, and I could destroy the snapshot. Looks like the auto snapshot script needs to make sure that if it finds any mounts pointing into .zfs folder in /proc/mounts, it umounts them and cribs with lsof output if it can umount them. |
with 7973e46 automatic unmount seems working:
|
@maxximino Thanks for commenting in all these related issues and verifying that it's fixed. Closing issue. |
…enzfs#1007) Bumps [serde_bytes](https://github.com/serde-rs/bytes) from 0.11.10 to 0.11.11. - [Release notes](https://github.com/serde-rs/bytes/releases) - [Commits](serde-rs/bytes@0.11.10...0.11.11) --- updated-dependencies: - dependency-name: serde_bytes dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
zfs destroy fs@somesnap
returns no error, but the snapshot is still there (as zfs list can show).
This did not happen ever after the system was freshly booted, only after some time of operation.
When destroying recursively, only SOME children snapshots are excluded from destruction, the parent being destroyed (!), making zfs_autosnap (and other scripts) fail to accomplish their goal.
Might be related to an earlier problem, when (prior to rc-10) happened to cause zfs destroy to fail with an error message along the lines of "Could not destroy fs@somesnap because snapshot is in use" This message is gone and the snapshots simply stay around, introducing a silent misbehavior.
Threads in Forum (likely) related to this, with aditional information on this:
https://groups.google.com/a/zfsonlinux.org/d/topic/zfs-discuss/hf64pJT9psU/discussion
https://groups.google.com/a/zfsonlinux.org/d/topic/zfs-discuss/bE66hKauCP0/discussion
https://groups.google.com/a/zfsonlinux.org/d/topic/zfs-discuss/IvEGaK_uOes/discussion
https://groups.google.com/a/zfsonlinux.org/d/topic/zfs-discuss/5GPAxLXQTTA/discussion
The text was updated successfully, but these errors were encountered: