Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add github_release strategy #15260

Merged

Conversation

SMillerDev
Copy link
Member

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This adds a middle way between the git strategy and the github_latest strategy allowing us to check releases even if there are multiple release streams.

@SMillerDev SMillerDev force-pushed the feature/livecheck/github_release_strategy branch 2 times, most recently from 00e0492 to c1b7c52 Compare April 18, 2023 14:13
Copy link
Member

@samford samford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I dig into reviewing this, one fundamental issue here is the use of the GitHub API. We've specifically avoided using it in livecheck strategies/blocks for years because of the rate limit involved. We had a relatively small number of checks using the GitHub API back in the Homebrew/homebrew-livecheck days but the checks would periodically fail in relation to the rate limit.

If I'm remembering correctly, the unauthenticated rate limit is 60 requests/hour and authenticated is 5,000/hour. Simply using homebrew/core for comparison, we have ~3100 formulae using the Git strategy in relation to a GitHub repository and ~275 formulae using GithubLatest. The high priority of the GithubRelease strategy here will cause it to be used for those ~3100 formulae that have been using Git up to now, so we're looking at nearly ~3,400 GitHub API requests for a homebrew/core run.

Unless you're authenticated and you have enough rate limit remaining to be able to execute all those checks, you'll seemingly end up with a ton of errors instead of meaningful data. The changes in this PR would essentially require livecheck users to set a GitHub API token to use livecheck in a meaningful way (since the unauthenticated limit is painfully low). [It probably goes without saying but we would want to keep track of the rate limit within livecheck and skip checks once the limit is reached (until the reset time), so we wouldn't burden the API with failing requests.]

Another related issue is that we run brew livecheck on core/cask CI as part of tests and we would need to make sure that GitHub API requests from livecheck don't impact the rate limit for our other API requests. The last thing we want is for livecheck to eat the rate limit and cause CI to fail.


That said, I'm currently working on doing comparative runs using the ~3,400 affected formulae to observe the changes (I'll go through casks after). I'm seeing errors when running related checks (unrelated to rate limit), so I'll come back with some data once I've got this working enough to run the checks without erroring.

@SMillerDev
Copy link
Member Author

Unless you're authenticated and you have enough rate limit remaining to be able to execute all those checks, you'll seemingly end up with a ton of errors instead of meaningful data. The changes in this PR would essentially require livecheck users to set a GitHub API token to use livecheck in a meaningful way (since the unauthenticated limit is painfully low).

Without it it will be hard to do anything with the results and it's a debug command, I don't see this as a bad thing.

Copy link
Member

@samford samford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Across the ~3,100 related checks in homebrew/core (excluding the 276 GithubLatest checks), this PR causes ~541 checks (~17%) that work properly with Git to produce an error instead (e.g., abcm2ps, acl2, afio, etc.). [2,356 checks (~76%) return the same version with Git and GithubRelease.]

  • 497 of these failures are because the repository has no releases. We would have to add/modify livecheck blocks to explicitly use strategy :git, due to GithubRelease's high priority [taking precedence over Git].
  • 10 of the errors are because the only available releases are old versions, unstable versions, etc., so we have to check the tags to properly identify versions.
  • 30 of these errors are because the GithubRelease strategy is using the standard regex for tags like 1.2.3/1.2.3 as the default and these repositories use an incompatible tag format (e.g., 20190702, 2023-04-12, r56, something-131, 2020-12-R1, etc.). The Git strategy's permissive default regex (/\D*(.+)/) works in almost all of these 30 cases. [On at least one occassion, I've explained in depth why we use a permissive default regex in Git instead of the standard regex (though it was a homebrew/core PR I've lost track of) and I still think it's the better default with respect to tag strings.]

197 checks (~6%) return a different newest upstream version when using GithubRelease compared to Git. 109 are a lower version than the Git version and 88 are higher. I haven't gone through these to determine the cause of the differences but it's likely some mixture of:

  • Checking name before tag_name (instead of the opposite)
  • Also checking the release name instead of only tag_name. [Using only tag_name would replicate the behavior of the Git strategy and allow existing regexes to work as expected.]
  • The more-specific strategy regex, as mentioned above
  • A tagged version not having a corresponding release. In some cases, the latest release version is what we want (i.e., the different version is more correct) and in other cases we have to check tag versions to get the correct version (i.e., the different version is incorrect).
  • A small number of repositories inevitably releasing an update in between runs (i.e., these changes aren't meaningful)

In the current state, the changes in this PR would:

  • Break hundreds of checks while only making a noticeable difference for a smaller number of checks
    • Require us to add/modify hundreds of livecheck blocks to prevent/fix breakages (this would need to be done before merging)
  • Introduce rate limiting, which stands to be a pain for someone like me, as I do full tap runs on a regular basis (often back-to-back when evaluating livecheck changes like this). It's also not clear how we could reasonably avoid broken checks when we have enough related checks to exceed 5,000 API requests in a single homebrew/core or homebrew/cask tap run. We're currently at ~3,400 requests for homebrew/core (~1,200 for homebrew/cask) and this strategy would see the most growth as new formulae/casks are added, so it's not an idle concern.
  • Dramatically slow down related checks (GithubRelease can take several seconds for each check, whereas Git can run several checks in the same amount of time)
    • I need to add some temporary timing logic to livecheck and re-run the checks to determine exactly how much slower GithubRelease is compared to Git
  • Increase the amount of fetched data involved with related checks (outside of a handful of repositories with an obscene number of tags), since the response includes a lot of release information that we don't usually need
    • Similarly, I need to re-run the checks with my usual temporary logic that tracks the size of the response content to get a hard number (though those numbers don't factor in compression)

The current approach for these checks (Git, unless GithubLatest is required/appropriate) is working fine in the vast majority of cases, often without a livecheck block. Git is faster/lighter, doesn't involve rate limiting, etc., so I'm not convinced that GithubRelease would be a better default. Checking releases instead of tags only appears to make a meaningful difference in a small number of these checks, so I'm against using GithubRelease as a default strategy that's prioritized over Git.

In general, I would prefer not to use the GitHub API when we can avoid it (as we've been doing for years). If we end up moving forward with GithubRelease, I would rather see it manually applied only where it makes a difference (like GithubLatest), rather than automatically applying it to ~4,600 of formulae/casks.

@SMillerDev SMillerDev marked this pull request as draft April 19, 2023 09:11
@MikeMcQuaid
Copy link
Member

Unless you're authenticated and you have enough rate limit remaining to be able to execute all those checks, you'll seemingly end up with a ton of errors instead of meaningful data. The changes in this PR would essentially require livecheck users to set a GitHub API token to use livecheck in a meaningful way (since the unauthenticated limit is painfully low).
Without it it will be hard to do anything with the results and it's a debug command, I don't see this as a bad thing.

I agree with this but we should probably actively fail without a token set if this is the case (or at least if you're querying more than one formula/cask).

@MikeMcQuaid
Copy link
Member

  • On at least one occassion, I've explained in depth why we use a permissive default regex in Git instead of the standard regex

@samford I appreciate your lengthy comments in PRs and issues but: this is an example of how they are sometimes misplaced. If you're having to explain the same thing multiple times about how/why the code behaves a certain way then:

  • ideally the code will make this more obvious without comments
  • where comments are necessary, they must be short, concise and explain the reasoning here

Livechecks in general have a lot of documentation (which is great!) but sometimes they explain the obvious and avoid things like this that people are actually asking you about multiple times.

Hope that's helpful feedback ❤️

@MikeMcQuaid
Copy link
Member

as I do full tap runs on a regular basis (often back-to-back when evaluating livecheck changes like this)

This should not be something we're optimising for as it's so far from the typical user experience. Similarly, this should be something we use automated tests for rather than relying on a single maintainer to do it manually.

@MikeMcQuaid
Copy link
Member

so I'm against using GithubRelease as a default strategy that's prioritized over Git.
I would rather see it manually applied only where it makes a difference (like GithubLatest

This seems like a good suggestion. @SMillerDev any objections to this? Can we move forward with this for now?

@SMillerDev
Copy link
Member Author

Yeah, seems fine. I've split out the migration to the API to make this review clearer though, so I'm focusing on that one first

@SMillerDev SMillerDev force-pushed the feature/livecheck/github_release_strategy branch 2 times, most recently from a3c7a07 to c92f7af Compare May 2, 2023 13:03
@SMillerDev SMillerDev marked this pull request as ready for review May 2, 2023 13:16
@SMillerDev SMillerDev force-pushed the feature/livecheck/github_release_strategy branch from c92f7af to edf2f10 Compare May 15, 2023 19:47
@SMillerDev
Copy link
Member Author

Rebased and pushed. Ready for some reviews

@samford
Copy link
Member

samford commented May 16, 2023

I was thinking about rebasing this earlier today, so thanks for taking care of it. I've gone through the changes and they're generally what I expected. This basically inherits the logic that we already worked through in the recent GithubLatest PR, so I've simply pushed some commits to address areas that were overlooked or still needed to be updated (e.g., documentation comments, type signatures, etc.).

I also added tests for GithubRelease that cover all of the code outside of #find_versions. [One of my upcoming PRs will cover the parts of #find_versions that don't involve a network request, so I'll update that work to include this strategy.]


The only other remaining change that I would like to discuss is the strategy name. This strategy checks the releases API endpoint for a repository and that returns information for multiple recent releases.

With the API URL in mind (i.e., releases) and that the response JSON contains an array of release objects, I strongly think that this should be named GithubReleases instead. GithubRelease makes it sound like it would only return one release (e.g., like GithubLatest) and this may confuse users about its purpose and behavior.

If we can agree on this name change, I can go through this in the morning (EDT) to update all the instances of GithubRelease, github_release, etc.


Past that, I tested this across existing GithubLatest checks and those continue to work the same. I also did some manual testing of GithubReleases with a cask that it could be used for (replacing an existing livecheck block that manually iterates through releases using multiple requests) and it works as expected.

@MikeMcQuaid
Copy link
Member

I strongly think that this should be named GithubReleases instead

I'm fine either way. If @SMillerDev agrees: go for it.

@SMillerDev
Copy link
Member Author

I have no strong feelings about it so: sure, let's do it

@samford
Copy link
Member

samford commented May 16, 2023

I was just about to push a commit with the changes (I just finished testing). A little more is required than what your commit covers, so do you mind if I overwrite it (i.e., force push) or would you prefer a follow-up commit instead?

Edit: I went ahead and created a follow-up commit with the remaining changes. I also tweaked my previous commit that added tests to modify the require call (to address some errors), hence the force push.

SMillerDev and others added 3 commits May 16, 2023 14:18
`GithubLatest` was updated to use parts of `GithubRelease` and this
works fine within `brew livecheck` (since we `require` all the
strategies) but we need to explicitly `require` `GithubRelease` in
the `GithubLatest` test file now. Without this, we encounter errors
in parts of `GithubLatest` where `GithubRelease` is referenced.
samford and others added 7 commits May 16, 2023 14:18
The initial documentation comments contained some remaining text
referring to `GithubLatest` and hadn't been updated to incorporate
the recent changes to the aforementioned strategy. This also reworks
some of the language to better explain the strategy's function,
application, etc.
Keeping the logic for generating the API URL in a method makes it
testable, aligns with other strategies, and will help to enable some
future work.
The `#versions_from_content` method requires a regex and this will be
enforced by the type signature, so we don't have to check for the
presence of a regex when handling a `strategy` block.
This adds tests to cover all of the strategy outside of the
`#find_versions` method, which we don't currently test because it
involves a network request.
@samford samford force-pushed the feature/livecheck/github_release_strategy branch 2 times, most recently from 656547e to 8c91280 Compare May 16, 2023 19:19
@samford
Copy link
Member

samford commented May 16, 2023

The most recent commit updates the main brew livecheck documentation to account for the recent GithubLatest/GithubReleases changes. Unless I'm forgetting something, I think that should be the last of this. Feel free to take a look through the recent changes but, if nothing else requires changes, I'll approve this once CI is green.

Past that, one thing I forgot to mention is that we also have a Homebrew::GitHubReleases class (note the capital H), so we may need to be a little careful to avoid a name collision. I've standardized existing references on the GithubReleases name (no capital H) and I think we should continue using that by default (instead of the GitHubReleases alias), as it may naturally help to avoid a collision and potentially reduce confusion.

Edit: As soon as I approved this, of course I noticed that I missed .rubocop.yml (i.e., ripgrep was omitting it since it's a [hidden] dot file). Updated.

@samford samford force-pushed the feature/livecheck/github_release_strategy branch from 8c91280 to 9b6e8f8 Compare May 16, 2023 20:08
@SMillerDev SMillerDev merged commit 49b4a79 into Homebrew:master May 17, 2023
@SMillerDev SMillerDev deleted the feature/livecheck/github_release_strategy branch May 17, 2023 09:10
@github-actions github-actions bot added the outdated PR was locked due to age label Jun 17, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants