-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load casks from the JSON API with HOMEBREW_INSTALL_FROM_API
#14304
Conversation
Review period will end on 2022-12-30 at 08:06:20 UTC. |
One possible way to solve the pre/postflight issue is just to use the This, of course, is not ideal since we're executing arbitrary code that was downloaded from the internet. We can probably improve this a bit by adding a hash of the source to the JSON API. This would help prevent corrupted downloads from causing issues since the hash wouldn't match, but isn't really any more secure against malicious actors. Thoughts on this method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me so far! I'd love to see this be:
- a bit more generic so more code is shared
- have some basic unit testing done, stubbing out online calls if necessary
This seems reasonable, particularly if we can indicate somehow in the JSON the minority of cases where this is required.
I think if this is always over HTTPS: I'm not overly concerned. Ultimately, if you can MITM the formulae or cask JSON APIs: you can execute arbitrary code. Any checksums here would have to be stored and generated in a different location to be useful (think like how bottles checksums are stored in Git). The only way I can think to have a checksum involved here would be something like using the GitHub raw Git API/endpoints, specifying the URLs with the Git checksums (rather than branch checksums) embedded and using this instead of |
Yeah, testing is definitely something we'll need to look at more before making this the default since we don't really have any My plan was to tackle testing in a future PR, but I can look into adding some basic tests for |
Do you mean indicate the cases where a Re checksums, I was thinking about using them as a way to ensure that files aren't corrupted more than as a way to prevent MITM attacks. We already check the It might be that the file being valid ruby is good enough for us, though, and so the checksum thing is not necessary |
Review period ended. |
Fine with me 👍🏻
Ok, cool, if we list them in the JSON then presumably we only need to fetch them when they are present there, right?
I feel like we should just retry the download in this case and only fail if it fails more than the current number of retries.
Yeh, I think that is good enough 👍🏻 |
Okay, cask "vlc" do
# ...
# shim script (https://github.com/Homebrew/homebrew-cask/issues/18809)
shimscript = "#{staged_path}/vlc.wrapper.sh"
binary shimscript, target: "vlc"
preflight do
File.write shimscript, <<~EOS
#!/bin/sh
exec '#{appdir}/VLC.app/Contents/MacOS/VLC' "$@"
EOS
end
# ...
end Executing the raw Any thoughts on how best to deal with this? Something else to keep in mind is other configuration-specific strings that are fully expanded in the JSON. For example, the JSON expands We can probably filter out |
I pretty strongly thing 2. is the best call here.
Is this just in
We'll figure some out before we make this default and some after. It's a big change, there will be corner cases! |
I think 1 would be pretty difficult to do. For 2, are you suggesting that we always use the cask source or only for casks with Part of me thinks that if we're going to use the source in some cases, why not do it in all cases? It would also remove the Note: there are currently 149 casks in homebrew/cask and 40 in other cask taps that use
This is an everywhere issue. For example, cask "sdm" do
# ...
uninstall delete: "#{appdir}/SDM.app/Contents/Resources/sdm-socket"
# ...
end The JSON API looks like this: {
"artifacts": [
{
"uninstall": [
{
"delete": "/Applications/SDM.app/Contents/Resources/sdm-socket"
}
]
}
]
} |
I'm in favor of just using the cask source for everything if we're not sure we've caught all of the edge cases. If we're sure we know about all of the edge cases, we could just tag those casks, include the tag in the JSON returned from the API and download the source for them. It seems like between the
|
Only for
There's >4000 casks in homebrew/cask alone. I think it's worth optimising for the most common case by not having to hit the network for that and fall back to the source only when we need it.
Ok. I think this would be fine to punt on for V1 of this being default and we just fail if people specify Later we'll have to change the JSON generation to output |
Co-authored-by: Mike McQuaid <[email protected]>
Okay, this now loads casks using the JSON API when there are no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @Rylan12! Some follow-up comments but 👍🏻 to merge this as-is.
json_cask.deep_symbolize_keys! | ||
|
||
# Use the cask-source API if there are any `*flight` blocks | ||
if json_cask[:artifacts].any? { |artifact| FLIGHT_STANZAS.include?(artifact.keys.first) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR: I wonder if it might be nice to push this to the API instead/as well so we have something like needs_cask_source: true
in the JSON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we want to remove the *flight
block source that was added in #14324 from the API if we're not actually going to use it?
|
||
# Use the cask-source API if there are any `*flight` blocks | ||
if json_cask[:artifacts].any? { |artifact| FLIGHT_STANZAS.include?(artifact.keys.first) } | ||
cask_source = Homebrew::API::CaskSource.fetch(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cask_source = Homebrew::API::CaskSource.fetch(token) | |
cask_source = Homebrew::API::CaskSource.fetch(token, sha) |
Not for this PR: I think it would be good for this API to require fetching the cask source at a particular Git SHA/revision that matches that from the JSON, for consistency and to be able to use the e.g. https://raw.githubusercontent.com/Homebrew/homebrew-cask/25e4143c41a98abb50cde6a5042e957fe5988236/Casks/0-ad.rb endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we originally opted to host our own source files on formulae.brew.sh because we felt that it would be faster than the files GitHub was hosting. Now that we're only using this method for a small number of casks, should we switch to using the GitHub-provided source URLs with hashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rylan12 Yeh, I think so. Some advantages:
- don't need to duplicate these files
- can ensure that we store e.g. the homebrew/cask revision in the "all casks"
.json
file and that we are always querying the correct/consistent version of the cask based on this revision
This PR updates the method for loading casks when
HOMEBREW_INSTALL_FROM_API
is set. Now, casks are loaded directly from the JSON API in the same way that formulae are. I've started to test this out locally and haven't had any issues so far (outside of the ones listed below).It's worth noting that the old
cask-source
API is still being used to write the cask source to the.metadata
folder. This is similar to the way that formula bottles also contain the formula file, except that the source for casks are retrieved using the API rather than a bottle.Also, a nice little side effect of this PR is that
brew edit
now works for casks withHOMEBREW_INSTALL_FROM_API
(while the taps are installed, of course)Here are some known issues:
{uninstall_,}{pre,post}flight
stanzas will currently ignore those when installing and uninstalling. This can cause issues in casks likevlc
which rely on these blocks to create files that will be referenced by the regular artifact stanzas. This is a relatively significant issue that is probably worth brainstorming about sooner rather than later since it does technically cause a regression from the old API-install behavior for casks.