-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cp -c
when copying files
#17373
Use cp -c
when copying files
#17373
Conversation
This module determines the `cp` command to use based on availability of the `coreutils` formula and optimizes the command invocation to prefer a lightweight copy-on-write clone, which is significantly faster than a full file copy and helps to reduce the risk of exhausting the storage during the operation.
This replaces `FileUtils.cp` and `system_command! "cp"` with the new `Utils::Cp` utility where it is expected that the performance improvement outweighs the cost of the system command invocation.
There is a cost of shelling out that should be considered here too. FWIW what happens under the hood with reflink is it calls macOS's default |
I have tried to limit the use of the system command to places that I guessed to be likely to deal with large payloads, but yes, that's only wild guess and I surely need some benchmarks.
I have considered that too, but I didn't adopt it because it fails if the (And sorry for the test failure. I forgot to run the test after rebasing on latest master. I'll fix it.) |
Ok fair point, though it does also seem macOS 14 shipped with a change to
Under the hood, it does this by checking if |
No. I was just writing in terms of the compatibility with old systems and it's not that I observe the behavior on the latest system. Perhaps I can make the module change the strategy based on macOS version. |
Even gating the whole feature to >= macOS 14 is probably ok. It would likely significantly simplify the code and that seems acceptable given we've gone on fine without the feature this long and older macOS versions cover only ~15% of users, and that figure will only decrease further (macOS 15 is announced next month). |
Thanks for the PR @tesaguri!
Yes, this seems like the right approach.
Not just microbenchmarks either but the benchmarks of actually installing e.g. a relevant formula/cache. Do this before adding/changing any tests as if the performance gain is not significant this won't be accepted. |
cp --reflink=auto
when copying filescp -c
when copying files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tesaguri, looking great and we're almost there! ❤️
I've done a quick benchmark of the execution time of The following tables are output of the
(The uninstallation seems to be unaffected by the change, as expected.) Also, I've tested with the bottle file as the argument just to be sure.
So far, the results are easily expected given the large size of these cask and formula (and there are no additional command invocation cost for casks anyway). But the pessimistic case would be small formulae for which the cost of system command invoctions relative to the IO cost is expected to be higher. To simulate the most pessimistic case, I've built a bottle with only a single zero-sized file and measured the installation time of it (the source of the formula will be shown later). Because the bottle is so small that it is hard to calibrate the time taken by unrelated tasks for sure, I've created 100 such bottles and installed them at once in each
The improvement in the uninstallation time is surprising to me. I think I'm missing something… Perhaps, this has something to do with the direct uninstallation from bottle files, which involves extraction of the bottles, I guess? Anyway, it seems that the change improves the performance even in the pessimistic case (at least it doesn't degrade the performance). (I haven't come up with test cases of other download strategies like ReproductionSource code of the test formula: class TestFormulaTiny000 < Formula
url "file:///dev/null"
version "0.0.0"
sha256 "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
def install
mkdir_p "test-tiny"
mv "null", "test-tiny/000"
share.install "test-tiny"
end
end Duplicate it and build them with the following command: $ for i in {001..099}; do sed "s/000/$i/g" test-formula-tiny000.rb > "test-formula-tiny$i.rb"; done
$ brew install --build-bottle -- test-formula-tiny{000..099} Here is a helper script for the benchmark (requires
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more naming thoughts. This is very nearly ready to 🚢! Very pleased by performance increases!
Thanks for the detailed write-up @tesaguri! Are you using |
As per review feedback: Homebrew#17373 (review) Co-authored-by: Mike McQuaid <[email protected]>
Homebrew#17373 (review) Co-authored-by: Mike McQuaid <[email protected]>
Yes, I'm already using |
This is pretty cool and has inspired me to look into adding |
`FileUtils.cp` is implemented with the lightweight `copy_file_range(2)` syscall on Linux, so it's more performant than the plain `cp` command on that platform. cf. Homebrew#17373 (review)
https://github.com/Homebrew/brew/actions/runs/9428441435/job/25973762028?pr=17373#step:12:42 Well, I have forgotten that Reverting this code to Perhaps, we can add yet another utility method (Edit: or a new keyword argument for existing methods?) that is guaranteed to use Another option is to revert the latest commit (9156891), but that would lead to a significant performance regression on Linux. Or should we abandon the whole patch and wait for Ruby upstream to use |
This fixes the test for `UnpackStrategy::Directory`, which needs the `cp` command.
Added |
I gave it a go: Bo98/ruby@6fc57a2. $ mkfile -n 1g big
$ hyperfine --warmup 3 --prepare "rm big2 || true" 'ruby-orig -e "IO.copy_stream(\"big\", \"big2\")"'
Benchmark 1: ruby-orig -e "IO.copy_stream(\"big\", \"big2\")"
Time (mean ± σ): 251.1 ms ± 8.3 ms [User: 23.1 ms, System: 141.8 ms]
Range (min … max): 238.6 ms … 264.5 ms 11 runs
$ hyperfine --warmup 3 --prepare "rm big2 || true" 'ruby-new -e "IO.copy_stream(\"big\", \"big2\")"'
Benchmark 1: ruby-new -e "IO.copy_stream(\"big\", \"big2\")"
Time (mean ± σ): 25.9 ms ± 0.9 ms [User: 21.5 ms, System: 3.6 ms]
Range (min … max): 25.2 ms … 33.7 ms 97 runs Works on macOS 10.13+.
I am ok with this PR as long as the commits are structured in a way that it's easy to revert the Casks will still use shelling out due to sudo needs, and those are also macOS only. |
Hmm that's annoying. Really it should use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question/discussion on some names and I think we're good to go here, thanks @tesaguri!
As per review feedback: Homebrew#17373 (comment)
I've added tests for the new module in 4edbbfd. I'm going to edit the commit history after the review completes, so that it's easier to partially revert the change if/when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @tesaguri! One more comment and then I'm ✅, thanks again for such great work here!
Thanks so much for your first contribution! Without people like you submitting PRs we couldn't run this project. You rock, @tesaguri; this is one of the highest quality first PRs I've ever seen in 15 years maintaining Homebrew! I hope to see many more from you ❤️ |
It looks like this change is causing some problems for folks so I opened a revert PR. |
Ah, sorry for the regression. Is the revert intended to be temporary? If so, it looks like the issues could be fixed by using fully-qualified path for the |
No worries. It happens. We're usually open to retrying PRs that get reverted as long as the original issues get addressed. I'm just not super familiar with this code personally so I didn't see an obvious way to fix forward last night. Feel free to open another PR for that. |
This sounds great, thanks! |
Hello, It seems adding "-c" option to cp can cause an error on MacOS Sonoma 14.9: For example: The / filesystem is APFS, the /Volumes/Home filesystem is APFS (Case Sensitive) Same error with Microsoft-Edge. HOMEBREW_VERSION: 4.3.6 Is there any config option to turn off -c option? Kind regards. |
@jmcbsimmonds This change has been reverted but has not made it into the most recent release yet. The weekly release usually comes out Monday-ish. |
This is released/fixed in Homebrew 4.3.7 now: https://github.com/Homebrew/brew/releases/tag/4.3.7 |
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?This patch makes Homebrew use
cp --reflink=auto
ifcoreutils
formula is installed, which significantly speeds up build/installation operations and reduces the risk of exhausting the storage during operations especially when dealing with large formulae.I will work on new tests if the idea sounds good to you.