Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] An option to recompress files #944

Open
Hi-Angel opened this issue Feb 2, 2025 · 7 comments
Open

[RFE] An option to recompress files #944

Hi-Angel opened this issue Feb 2, 2025 · 7 comments
Labels
question Not a bug, clarifications, undocumented behaviour

Comments

@Hi-Angel
Copy link

Hi-Angel commented Feb 2, 2025

Users sometimes have directories with content that rarely if ever change, such as various git repositories downloaded for examination purposes, or WINE-based installations, or even just the complete ~/Downloads folder outright. It may be desirable for them to be compressed with high compression level such as zstd:10, to reduce both space and IO.

At the same time, keeping such high compression at all times is not desirable, because it would be stressing out CPU for little gain.

Would be great if btrfs fi (or another utility) supported an option to recompress a directory with a different compression level.

@Forza-tng
Copy link
Contributor

Forza-tng commented Feb 2, 2025

btrfs fi defrag can be used to recompress files. It does not support setting compression level directly, however it inherits the mount option used. Therefore a workaround is something like this:

# mount /mnt/btrfs -o remount,compress-force=zstd:15
# btrfs fi defrag -v -r /mnt/btrfs/backups/
# mount /mnt/btrfs -o remount,compress=zstd:2

It would be really nice if defrag supported -c<algo>:<level>, but I think it requires changes to the kernel ioctl.

There are some other ways to set and manage compression too.

@Hi-Angel
Copy link
Author

Hi-Angel commented Feb 2, 2025

Oh, I see, thanks! I didn't think to look at defrag because it's not obvious defrag may have something compression related. Though, I'm not sure how would it possible to improve docs in this case, so… just closing the bug.

@Hi-Angel Hi-Angel closed this as completed Feb 2, 2025
@Hi-Angel
Copy link
Author

Hi-Angel commented Feb 2, 2025

btrfs fi defrag can be used to recompress files. It does not support setting compression level directly, however it inherits the mount option used.

Oh, but how does it work then for -c option which allows to chose a compression algo? I mean, if you have zstd compression in mount options, but then chose a different algo in -c, how would it inherit the level?

@Hi-Angel
Copy link
Author

Hi-Angel commented Feb 2, 2025

The implicit reason for this question is because this makes me think: what if the command doesn't actually inherit the level, but expects you to provide ZSTD_CLEVEL env variable. I mean, nothing of that is documented, so it's a fair question…

@Forza-tng
Copy link
Contributor

btrfs fi defrag can be used to recompress files. It does not support setting compression level directly, however it inherits the mount option used.

Oh, but how does it work then for -c option which allows to chose a compression algo? I mean, if you have zstd compression in mount options, but then chose a different algo in -c, how would it inherit the level?

Using -c algo overrides the mount options, and AFAIK uses default compression level.

@Zygo
Copy link

Zygo commented Feb 2, 2025

Compression always uses the level specified in the mount options (which may be the default if the mount option didn't specify level). That means you can do something like:

mount -o remount,compress=zstd:9 /fs
btrfs fi defrag -rczstd /home/*/Downloads
mount -o remount,compress=zstd:3 /fs

which would recompress "everything" in users' Downloads directories with level 9.

This is not ideal because:

  1. The mount option is global, so it also compresses any other write on the filesystem that happened during that time with level 9
  2. btrfs fi defrag is broken for this use case because the hardcoded logic in the kernel attempts to defragment files instead of compressing them. Recent kernel versions will say "oh this data is already compressed, leave it alone".

A more reliable way to achieve this is:

find -type f -exec sh -c '
    for x; do
        touch "$x.clone" && chattr +c "$x.clone" && cp --reflink=never "$x" "$x.clone" && mv -f "$x.clone" "$x";
    done
' {} +

but that has other obvious problems (it will continually break reflinks with snapshots, recompress data unnecessarily, and it should definitely not be run on a multi-user system) and some less-obvious ones (it will limit the extent length for uncompressed files to 512K).

A proper tool for this has to do a few extra steps:

  1. Assess whether the file is compressible. This could be a heuristic, a userspace implementation of compression, or simply copy the file to a new location, then read back the btrfs metadata to see if it was compressed. If the new file is not smaller than the old one, delete the new file and move on to the next.
  2. Recompress the file (i.e. write the data to a new location with compression). In newer kernels, userspace can compress the data and provide the compressed data directly to the kernel. The already existing per-file compression xattr can parse the attribute string to extract a per-file compression level for the kernel.
  3. Replace the original file with the compressed version. In userspace this can be done with the dedupe ioctl. In the kernel, this could be done by copying the defrag ioctl to a new ioctl, then deleting all the defragmentation code in the new ioctl so that it does a straight lock, copy, replace operation.
  4. Track which files this has been done to so that all the above work is not repeated over and over (i.e. store a database in userspace, or add an xattr to mark which files have been done).

@Hi-Angel
Copy link
Author

Hi-Angel commented Feb 2, 2025

Okay, reopening per last comment

@Hi-Angel Hi-Angel reopened this Feb 2, 2025
@kdave kdave added the question Not a bug, clarifications, undocumented behaviour label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Not a bug, clarifications, undocumented behaviour
Projects
None yet
Development

No branches or pull requests

4 participants