Why 100% full filesystem would cause ZFS uncorrectable I/O failure? #11759

jsmarkb · 2021-03-16T18:20:43Z

jsmarkb
Mar 16, 2021

There's been a weird conditioning happening every once in a while that is bugging me and I'd really like to dig further and figure out what is going on. I've got some VMs with ZFS pools built from a single LVM2 device. Every now and again, one of those VMs fills up its ZFS filesystem to 100% and occasionally this causes:

WARNING: Pool 'vg01-data' has encountered an uncorrectable I/O failure and has been suspended

The pool gets suspended, all writes to the filesystem are stopped and applications hang or crash.

I have been unable to reproduce this condition in a controlled test. No matter how I fill up a filesystem built in exactly the same way to 100%, ZFS does not report uncorrectable I/O failures.

This is very puzzling and I'd really like some help from people who understand the ZFS codebase better as to where to look for clues in solving this one. If it happens again, what diagnostics would be most useful? Where might the problem lie in the code, is it going to be in the ZFS code or the LVM2 code?

I'm guessing at the moment, but I wonder if it's possible that there is some ZFS metadata that cannot be written out because the underlying LVM2 volume is full, some metadata that ZFS expected would be written. But not sure why I can't reproduce that issue, unless the data has to be an exact size to trigger it?

One possible solution I've considered is setting up a quota or a reservation so that the filesystem I create cannot use 100% of the underlying pool. I'm tempted to just give it a try and if I never get the error again then considering it as good as solved, but there again as I can't reproduce the problem I might just hit the same issue again in 2 months and I'll be back to the drawing board.

If anyone has any suggestions for how to tackle this one, I'd love to hear your ideas.

Thanks
Mark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why 100% full filesystem would cause ZFS uncorrectable I/O failure? #11759

{{title}}

Replies: 0 comments

Select a reply

Why 100% full filesystem would cause ZFS uncorrectable I/O failure? #11759

jsmarkb Mar 16, 2021

Replies: 0 comments

jsmarkb
Mar 16, 2021