Skip to content

Commit

Permalink
Btrfs: fix unexpected failure of nocow buffered writes after snapshot…
Browse files Browse the repository at this point in the history
…ting when low on space

Commit e9894fd ("Btrfs: fix snapshot vs nocow writting") forced
nocow writes to fallback to COW, during writeback, when a snapshot is
created. This resulted in writes made before creating the snapshot to
unexpectedly fail with ENOSPC during writeback when success (0) was
returned to user space through the write system call.

The steps leading to this problem are:

1. When it's not possible to allocate data space for a write, the
   buffered write path checks if a NOCOW write is possible.  If it is,
   it will not reserve space and success (0) is returned to user space.

2. Then when a snapshot is created, the root's will_be_snapshotted
   atomic is incremented and writeback is triggered for all inode's that
   belong to the root being snapshotted. Incrementing that atomic forces
   all previous writes to fallback to COW during writeback (running
   delalloc).

3. This results in the writeback for the inodes to fail and therefore
   setting the ENOSPC error in their mappings, so that a subsequent
   fsync on them will report the error to user space. So it's not a
   completely silent data loss (since fsync will report ENOSPC) but it's
   a very unexpected and undesirable behaviour, because if a clean
   shutdown/unmount of the filesystem happens without previous calls to
   fsync, it is expected to have the data present in the files after
   mounting the filesystem again.

So fix this by adding a new atomic named snapshot_force_cow to the
root structure which prevents this behaviour and works the following way:

1. It is incremented when we start to create a snapshot after triggering
   writeback and before waiting for writeback to finish.

2. This new atomic is now what is used by writeback (running delalloc)
   to decide whether we need to fallback to COW or not. Because we
   incremented this new atomic after triggering writeback in the
   snapshot creation ioctl, we ensure that all buffered writes that
   happened before snapshot creation will succeed and not fallback to
   COW (which would make them fail with ENOSPC).

3. The existing atomic, will_be_snapshotted, is kept because it is used
   to force new buffered writes, that start after we started
   snapshotting, to reserve data space even when NOCOW is possible.
   This makes these writes fail early with ENOSPC when there's no
   available space to allocate, preventing the unexpected behaviour of
   writeback later failing with ENOSPC due to a fallback to COW mode.

Fixes: e9894fd ("Btrfs: fix snapshot vs nocow writting")
Signed-off-by: Robbie Ko <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
  • Loading branch information
Robbie Ko authored and kdave committed Aug 17, 2018
1 parent 39379fa commit 8ecebf4
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 21 deletions.
1 change: 1 addition & 0 deletions fs/btrfs/ctree.h
Original file line number Diff line number Diff line change
Expand Up @@ -1280,6 +1280,7 @@ struct btrfs_root {
int send_in_progress;
struct btrfs_subvolume_writers *subv_writers;
atomic_t will_be_snapshotted;
atomic_t snapshot_force_cow;

/* For qgroup metadata reserved space */
spinlock_t qgroup_meta_rsv_lock;
Expand Down
1 change: 1 addition & 0 deletions fs/btrfs/disk-io.c
Original file line number Diff line number Diff line change
Expand Up @@ -1187,6 +1187,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
atomic_set(&root->log_batch, 0);
refcount_set(&root->refs, 1);
atomic_set(&root->will_be_snapshotted, 0);
atomic_set(&root->snapshot_force_cow, 0);
root->log_transid = 0;
root->log_transid_committed = -1;
root->last_log_commit = 0;
Expand Down
25 changes: 4 additions & 21 deletions fs/btrfs/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1271,7 +1271,7 @@ static noinline int run_delalloc_nocow(struct inode *inode,
u64 disk_num_bytes;
u64 ram_bytes;
int extent_type;
int ret, err;
int ret;
int type;
int nocow;
int check_prev = 1;
Expand Down Expand Up @@ -1403,11 +1403,8 @@ static noinline int run_delalloc_nocow(struct inode *inode,
* if there are pending snapshots for this root,
* we fall into common COW way.
*/
if (!nolock) {
err = btrfs_start_write_no_snapshotting(root);
if (!err)
goto out_check;
}
if (!nolock && atomic_read(&root->snapshot_force_cow))
goto out_check;
/*
* force cow if csum exists in the range.
* this ensure that csum for a given extent are
Expand All @@ -1416,9 +1413,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
ret = csum_exist_in_range(fs_info, disk_bytenr,
num_bytes);
if (ret) {
if (!nolock)
btrfs_end_write_no_snapshotting(root);

/*
* ret could be -EIO if the above fails to read
* metadata.
Expand All @@ -1431,11 +1425,8 @@ static noinline int run_delalloc_nocow(struct inode *inode,
WARN_ON_ONCE(nolock);
goto out_check;
}
if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) {
if (!nolock)
btrfs_end_write_no_snapshotting(root);
if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr))
goto out_check;
}
nocow = 1;
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
extent_end = found_key.offset +
Expand All @@ -1448,8 +1439,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
out_check:
if (extent_end <= start) {
path->slots[0]++;
if (!nolock && nocow)
btrfs_end_write_no_snapshotting(root);
if (nocow)
btrfs_dec_nocow_writers(fs_info, disk_bytenr);
goto next_slot;
Expand All @@ -1471,8 +1460,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
end, page_started, nr_written, 1,
NULL);
if (ret) {
if (!nolock && nocow)
btrfs_end_write_no_snapshotting(root);
if (nocow)
btrfs_dec_nocow_writers(fs_info,
disk_bytenr);
Expand All @@ -1492,8 +1479,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
ram_bytes, BTRFS_COMPRESS_NONE,
BTRFS_ORDERED_PREALLOC);
if (IS_ERR(em)) {
if (!nolock && nocow)
btrfs_end_write_no_snapshotting(root);
if (nocow)
btrfs_dec_nocow_writers(fs_info,
disk_bytenr);
Expand Down Expand Up @@ -1532,8 +1517,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
EXTENT_CLEAR_DATA_RESV,
PAGE_UNLOCK | PAGE_SET_PRIVATE2);

if (!nolock && nocow)
btrfs_end_write_no_snapshotting(root);
cur_offset = extent_end;

/*
Expand Down
16 changes: 16 additions & 0 deletions fs/btrfs/ioctl.c
Original file line number Diff line number Diff line change
Expand Up @@ -747,6 +747,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
struct btrfs_pending_snapshot *pending_snapshot;
struct btrfs_trans_handle *trans;
int ret;
bool snapshot_force_cow = false;

if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
return -EINVAL;
Expand All @@ -763,6 +764,11 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
goto free_pending;
}

/*
* Force new buffered writes to reserve space even when NOCOW is
* possible. This is to avoid later writeback (running dealloc) to
* fallback to COW mode and unexpectedly fail with ENOSPC.
*/
atomic_inc(&root->will_be_snapshotted);
smp_mb__after_atomic();
/* wait for no snapshot writes */
Expand All @@ -773,6 +779,14 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
if (ret)
goto dec_and_free;

/*
* All previous writes have started writeback in NOCOW mode, so now
* we force future writes to fallback to COW mode during snapshot
* creation.
*/
atomic_inc(&root->snapshot_force_cow);
snapshot_force_cow = true;

btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);

btrfs_init_block_rsv(&pending_snapshot->block_rsv,
Expand Down Expand Up @@ -837,6 +851,8 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
fail:
btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv);
dec_and_free:
if (snapshot_force_cow)
atomic_dec(&root->snapshot_force_cow);
if (atomic_dec_and_test(&root->will_be_snapshotted))
wake_up_var(&root->will_be_snapshotted);
free_pending:
Expand Down

0 comments on commit 8ecebf4

Please sign in to comment.