Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load casks from the JSON API with HOMEBREW_INSTALL_FROM_API #14304

Merged
merged 8 commits into from
Jan 6, 2023
Merged

Load casks from the JSON API with HOMEBREW_INSTALL_FROM_API #14304

merged 8 commits into from
Jan 6, 2023

Conversation

Rylan12
Copy link
Member

@Rylan12 Rylan12 commented Dec 29, 2022

This PR updates the method for loading casks when HOMEBREW_INSTALL_FROM_API is set. Now, casks are loaded directly from the JSON API in the same way that formulae are. I've started to test this out locally and haven't had any issues so far (outside of the ones listed below).

It's worth noting that the old cask-source API is still being used to write the cask source to the .metadata folder. This is similar to the way that formula bottles also contain the formula file, except that the source for casks are retrieved using the API rather than a bottle.

Also, a nice little side effect of this PR is that brew edit now works for casks with HOMEBREW_INSTALL_FROM_API (while the taps are installed, of course)


Here are some known issues:

  • Casks with {uninstall_,}{pre,post}flight stanzas will currently ignore those when installing and uninstalling. This can cause issues in casks like vlc which rely on these blocks to create files that will be referenced by the regular artifact stanzas. This is a relatively significant issue that is probably worth brainstorming about sooner rather than later since it does technically cause a regression from the old API-install behavior for casks.

@BrewTestBot
Copy link
Member

Review period will end on 2022-12-30 at 08:06:20 UTC.

@BrewTestBot BrewTestBot added the waiting for feedback Merging is blocked until sufficient time has passed for review label Dec 29, 2022
@Rylan12
Copy link
Member Author

Rylan12 commented Dec 29, 2022

One possible way to solve the pre/postflight issue is just to use the cask-source API to get the ruby source for those methods and evaluate that.

This, of course, is not ideal since we're executing arbitrary code that was downloaded from the internet. We can probably improve this a bit by adding a hash of the source to the JSON API. This would help prevent corrupted downloads from causing issues since the hash wouldn't match, but isn't really any more secure against malicious actors.

Thoughts on this method?

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me so far! I'd love to see this be:

  • a bit more generic so more code is shared
  • have some basic unit testing done, stubbing out online calls if necessary

@MikeMcQuaid
Copy link
Member

One possible way to solve the pre/postflight issue is just to use the cask-source API to get the ruby source for those methods and evaluate that.

This seems reasonable, particularly if we can indicate somehow in the JSON the minority of cases where this is required.

This, of course, is not ideal since we're executing arbitrary code that was downloaded from the internet. We can probably improve this a bit by adding a hash of the source to the JSON API. This would help prevent corrupted downloads from causing issues since the hash wouldn't match, but isn't really any more secure against malicious actors.

I think if this is always over HTTPS: I'm not overly concerned. Ultimately, if you can MITM the formulae or cask JSON APIs: you can execute arbitrary code.

Any checksums here would have to be stored and generated in a different location to be useful (think like how bottles checksums are stored in Git).

The only way I can think to have a checksum involved here would be something like using the GitHub raw Git API/endpoints, specifying the URLs with the Git checksums (rather than branch checksums) embedded and using this instead of cask-source. This at least gets us some separation between formulae.brew.sh generation and the raw Ruby (stored in a Git repository) but I'm not sure if this is overkill.

@Rylan12
Copy link
Member Author

Rylan12 commented Dec 29, 2022

Yeah, testing is definitely something we'll need to look at more before making this the default since we don't really have any HOMEBREW_INSTALL_FROM_API coverage yet.

My plan was to tackle testing in a future PR, but I can look into adding some basic tests for Homebrew::API for now

@Rylan12
Copy link
Member Author

Rylan12 commented Dec 29, 2022

This seems reasonable, particularly if we can indicate somehow in the JSON the minority of cases where this is required.

Do you mean indicate the cases where a *flight block is around or the cases where the *flight block is crucial to the install process? We already list the flight blocks in the JSON. Doing the latter would probably require a maintainer to explicitly indicate that its the case

Re checksums, I was thinking about using them as a way to ensure that files aren't corrupted more than as a way to prevent MITM attacks. We already check the cask.json file for corrupted downloads by making sure it is valid JSON. To ensure that cask-source files are not corrupted before we try to execute them, I was thinking was more that we take a hash of the cask-source file and include that in the JSON API. Then, if we compare them and they don't match we know that at least one was corrupted somehow and we shouldn't try to execute it.

It might be that the file being valid ruby is good enough for us, though, and so the checksum thing is not necessary

@BrewTestBot BrewTestBot removed the waiting for feedback Merging is blocked until sufficient time has passed for review label Dec 30, 2022
@BrewTestBot
Copy link
Member

Review period ended.

@MikeMcQuaid
Copy link
Member

My plan was to tackle testing in a future PR

Fine with me 👍🏻

Do you mean indicate the cases where a *flight block is around or the cases where the *flight block is crucial to the install process? We already list the flight blocks in the JSON. Doing the latter would probably require a maintainer to explicitly indicate that its the case

Ok, cool, if we list them in the JSON then presumably we only need to fetch them when they are present there, right?

Then, if we compare them and they don't match we know that at least one was corrupted somehow and we shouldn't try to execute it.

I feel like we should just retry the download in this case and only fail if it fails more than the current number of retries.

It might be that the file being valid ruby is good enough for us, though, and so the checksum thing is not necessary

Yeh, I think that is good enough 👍🏻

@Rylan12
Copy link
Member Author

Rylan12 commented Jan 4, 2023

Okay, *flight blocks can now be loaded from the API. However, I've found that this isn't always enough to solve the problem. Take vlc as an example:

cask "vlc" do
  # ...

  # shim script (https://github.com/Homebrew/homebrew-cask/issues/18809)
  shimscript = "#{staged_path}/vlc.wrapper.sh"
  binary shimscript, target: "vlc"

  preflight do
    File.write shimscript, <<~EOS
      #!/bin/sh
      exec '#{appdir}/VLC.app/Contents/MacOS/VLC' "$@"
    EOS
  end

  # ...
end

Executing the raw preflight block when loading from the API fails because there's no definition for the shimscript variable. We don't currently encode those in the API at all. This doesn't affect the binary artifact, though, since the string value of shimscript is what's returned to the API.

Any thoughts on how best to deal with this?


Something else to keep in mind is other configuration-specific strings that are fully expanded in the JSON. For example, the JSON expands appdir to the default /Applications and interpolates this wherever it's used. So, a user specifying their own value for appdir would not see that value respected when installing from the API.

We can probably filter out /Applications and replace it with a placeholder like we do with /opt/homebrew. This does make me wonder, though, what other similar issues I'm forgetting about

@MikeMcQuaid
Copy link
Member

Any thoughts on how best to deal with this?

  1. introduce cops/audits for these cases to ensure that preflight etc. works standalone
  2. revert all this and just fall back to downloading the cask source from GitHub in these cases

I pretty strongly thing 2. is the best call here.

For example, the JSON expands appdir to the default /Applications and interpolates this wherever it's used.

Is this just in preflight or everywhere? Not a blocker for making this default, I think, but we will need to note this limitation. Feels like either we deprecate these sort of options or need to move away from interpolation in their implementation.

This does make me wonder, though, what other similar issues I'm forgetting about

We'll figure some out before we make this default and some after. It's a big change, there will be corner cases!

@Rylan12
Copy link
Member Author

Rylan12 commented Jan 4, 2023

  1. introduce cops/audits for these cases to ensure that preflight etc. works standalone
  2. revert all this and just fall back to downloading the cask source from GitHub in these cases

I think 1 would be pretty difficult to do. For 2, are you suggesting that we always use the cask source or only for casks with *flight blocks?

Part of me thinks that if we're going to use the source in some cases, why not do it in all cases? It would also remove the appdir issue.

Note: there are currently 149 casks in homebrew/cask and 40 in other cask taps that use *flight blocks.

Is this just in preflight or everywhere? Not a blocker for making this default, I think, but we will need to note this limitation. Feels like either we deprecate these sort of options or need to move away from interpolation in their implementation.

This is an everywhere issue. For example, sdm:

cask "sdm" do
  # ...

  uninstall delete: "#{appdir}/SDM.app/Contents/Resources/sdm-socket"

  # ...
end

The JSON API looks like this:

{
  "artifacts": [
    {
      "uninstall": [
        {
          "delete": "/Applications/SDM.app/Contents/Resources/sdm-socket"
        }
      ]
    }
  ]
}

@apainintheneck
Copy link
Contributor

I'm in favor of just using the cask source for everything if we're not sure we've caught all of the edge cases.

If we're sure we know about all of the edge cases, we could just tag those casks, include the tag in the JSON returned from the API and download the source for them. It seems like between the *flight blocks and the appdir variable there are around 258 casks in homebrew-cask that fit those criteria. That's around 6% of the total.

/u/l/H/L/T/h/homebrew-cask (master|✔) $ git grep -l -E '#\{appdir\}|preflight|postflight' | wc -l
     258

@MikeMcQuaid
Copy link
Member

I think 1 would be pretty difficult to do. For 2, are you suggesting that we always use the cask source or only for casks with *flight blocks?

Only for *flight blocks are other cases where we need them.

Part of me thinks that if we're going to use the source in some cases, why not do it in all cases? It would also remove the appdir issue.
Note: there are currently 149 casks in homebrew/cask and 40 in other cask taps that use *flight blocks.

There's >4000 casks in homebrew/cask alone. I think it's worth optimising for the most common case by not having to hit the network for that and fall back to the source only when we need it.

This is an everywhere issue. For example, sdm:

Ok. I think this would be fine to punt on for V1 of this being default and we just fail if people specify --appdir without HOMEBREW_NO_INSTALL_FROM_API set.

Later we'll have to change the JSON generation to output "delete": "${APPDIR}/Contents/Resources/sdm-socket" or similar and then replace it ourselves when we consume it.

@Rylan12
Copy link
Member Author

Rylan12 commented Jan 6, 2023

Okay, this now loads casks using the JSON API when there are no *flight blocks, and from the cask source if there are.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @Rylan12! Some follow-up comments but 👍🏻 to merge this as-is.

json_cask.deep_symbolize_keys!

# Use the cask-source API if there are any `*flight` blocks
if json_cask[:artifacts].any? { |artifact| FLIGHT_STANZAS.include?(artifact.keys.first) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR: I wonder if it might be nice to push this to the API instead/as well so we have something like needs_cask_source: true in the JSON?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we want to remove the *flight block source that was added in #14324 from the API if we're not actually going to use it?


# Use the cask-source API if there are any `*flight` blocks
if json_cask[:artifacts].any? { |artifact| FLIGHT_STANZAS.include?(artifact.keys.first) }
cask_source = Homebrew::API::CaskSource.fetch(token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cask_source = Homebrew::API::CaskSource.fetch(token)
cask_source = Homebrew::API::CaskSource.fetch(token, sha)

Not for this PR: I think it would be good for this API to require fetching the cask source at a particular Git SHA/revision that matches that from the JSON, for consistency and to be able to use the e.g. https://raw.githubusercontent.com/Homebrew/homebrew-cask/25e4143c41a98abb50cde6a5042e957fe5988236/Casks/0-ad.rb endpoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we originally opted to host our own source files on formulae.brew.sh because we felt that it would be faster than the files GitHub was hosting. Now that we're only using this method for a small number of casks, should we switch to using the GitHub-provided source URLs with hashes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rylan12 Yeh, I think so. Some advantages:

  • don't need to duplicate these files
  • can ensure that we store e.g. the homebrew/cask revision in the "all casks" .json file and that we are always querying the correct/consistent version of the cask based on this revision

@MikeMcQuaid MikeMcQuaid enabled auto-merge January 6, 2023 08:40
@MikeMcQuaid MikeMcQuaid merged commit 974984d into Homebrew:master Jan 6, 2023
@Rylan12 Rylan12 deleted the load-cask-from-json branch January 6, 2023 18:47
@github-actions github-actions bot added the outdated PR was locked due to age label Feb 6, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants