-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add github_release strategy #15260
feat: add github_release strategy #15260
Conversation
00e0492
to
c1b7c52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I dig into reviewing this, one fundamental issue here is the use of the GitHub API. We've specifically avoided using it in livecheck strategies/blocks for years because of the rate limit involved. We had a relatively small number of checks using the GitHub API back in the Homebrew/homebrew-livecheck days but the checks would periodically fail in relation to the rate limit.
If I'm remembering correctly, the unauthenticated rate limit is 60 requests/hour and authenticated is 5,000/hour. Simply using homebrew/core for comparison, we have ~3100 formulae using the Git
strategy in relation to a GitHub repository and ~275 formulae using GithubLatest
. The high priority of the GithubRelease
strategy here will cause it to be used for those ~3100 formulae that have been using Git
up to now, so we're looking at nearly ~3,400 GitHub API requests for a homebrew/core run.
Unless you're authenticated and you have enough rate limit remaining to be able to execute all those checks, you'll seemingly end up with a ton of errors instead of meaningful data. The changes in this PR would essentially require livecheck users to set a GitHub API token to use livecheck in a meaningful way (since the unauthenticated limit is painfully low). [It probably goes without saying but we would want to keep track of the rate limit within livecheck and skip checks once the limit is reached (until the reset time), so we wouldn't burden the API with failing requests.]
Another related issue is that we run brew livecheck
on core/cask CI as part of tests and we would need to make sure that GitHub API requests from livecheck don't impact the rate limit for our other API requests. The last thing we want is for livecheck to eat the rate limit and cause CI to fail.
That said, I'm currently working on doing comparative runs using the ~3,400 affected formulae to observe the changes (I'll go through casks after). I'm seeing errors when running related checks (unrelated to rate limit), so I'll come back with some data once I've got this working enough to run the checks without erroring.
Without it it will be hard to do anything with the results and it's a debug command, I don't see this as a bad thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Across the ~3,100 related checks in homebrew/core (excluding the 276 GithubLatest
checks), this PR causes ~541 checks (~17%) that work properly with Git
to produce an error instead (e.g., abcm2ps
, acl2
, afio
, etc.). [2,356 checks (~76%) return the same version with Git
and GithubRelease
.]
- 497 of these failures are because the repository has no releases. We would have to add/modify
livecheck
blocks to explicitly usestrategy :git
, due toGithubRelease
's high priority [taking precedence overGit
]. - 10 of the errors are because the only available releases are old versions, unstable versions, etc., so we have to check the tags to properly identify versions.
- 30 of these errors are because the
GithubRelease
strategy is using the standard regex for tags like1.2.3
/1.2.3
as the default and these repositories use an incompatible tag format (e.g.,20190702
,2023-04-12
,r56
,something-131
,2020-12-R1
, etc.). TheGit
strategy's permissive default regex (/\D*(.+)/
) works in almost all of these 30 cases. [On at least one occassion, I've explained in depth why we use a permissive default regex inGit
instead of the standard regex (though it was a homebrew/core PR I've lost track of) and I still think it's the better default with respect to tag strings.]
197 checks (~6%) return a different newest upstream version when using GithubRelease
compared to Git
. 109 are a lower version than the Git
version and 88 are higher. I haven't gone through these to determine the cause of the differences but it's likely some mixture of:
- Checking
name
beforetag_name
(instead of the opposite) - Also checking the release
name
instead of onlytag_name
. [Using onlytag_name
would replicate the behavior of theGit
strategy and allow existing regexes to work as expected.] - The more-specific strategy regex, as mentioned above
- A tagged version not having a corresponding release. In some cases, the latest release version is what we want (i.e., the different version is more correct) and in other cases we have to check tag versions to get the correct version (i.e., the different version is incorrect).
- A small number of repositories inevitably releasing an update in between runs (i.e., these changes aren't meaningful)
In the current state, the changes in this PR would:
- Break hundreds of checks while only making a noticeable difference for a smaller number of checks
- Require us to add/modify hundreds of
livecheck
blocks to prevent/fix breakages (this would need to be done before merging)
- Require us to add/modify hundreds of
- Introduce rate limiting, which stands to be a pain for someone like me, as I do full tap runs on a regular basis (often back-to-back when evaluating livecheck changes like this). It's also not clear how we could reasonably avoid broken checks when we have enough related checks to exceed 5,000 API requests in a single homebrew/core or homebrew/cask tap run. We're currently at ~3,400 requests for homebrew/core (~1,200 for homebrew/cask) and this strategy would see the most growth as new formulae/casks are added, so it's not an idle concern.
- Dramatically slow down related checks (
GithubRelease
can take several seconds for each check, whereasGit
can run several checks in the same amount of time)- I need to add some temporary timing logic to livecheck and re-run the checks to determine exactly how much slower
GithubRelease
is compared toGit
- I need to add some temporary timing logic to livecheck and re-run the checks to determine exactly how much slower
- Increase the amount of fetched data involved with related checks (outside of a handful of repositories with an obscene number of tags), since the response includes a lot of release information that we don't usually need
- Similarly, I need to re-run the checks with my usual temporary logic that tracks the size of the response content to get a hard number (though those numbers don't factor in compression)
The current approach for these checks (Git
, unless GithubLatest
is required/appropriate) is working fine in the vast majority of cases, often without a livecheck
block. Git
is faster/lighter, doesn't involve rate limiting, etc., so I'm not convinced that GithubRelease
would be a better default. Checking releases instead of tags only appears to make a meaningful difference in a small number of these checks, so I'm against using GithubRelease
as a default strategy that's prioritized over Git
.
In general, I would prefer not to use the GitHub API when we can avoid it (as we've been doing for years). If we end up moving forward with GithubRelease
, I would rather see it manually applied only where it makes a difference (like GithubLatest
), rather than automatically applying it to ~4,600 of formulae/casks.
I agree with this but we should probably actively fail without a token set if this is the case (or at least if you're querying more than one formula/cask). |
@samford I appreciate your lengthy comments in PRs and issues but: this is an example of how they are sometimes misplaced. If you're having to explain the same thing multiple times about how/why the code behaves a certain way then:
Livechecks in general have a lot of documentation (which is great!) but sometimes they explain the obvious and avoid things like this that people are actually asking you about multiple times. Hope that's helpful feedback ❤️ |
This should not be something we're optimising for as it's so far from the typical user experience. Similarly, this should be something we use automated tests for rather than relying on a single maintainer to do it manually. |
This seems like a good suggestion. @SMillerDev any objections to this? Can we move forward with this for now? |
Yeah, seems fine. I've split out the migration to the API to make this review clearer though, so I'm focusing on that one first |
a3c7a07
to
c92f7af
Compare
c92f7af
to
edf2f10
Compare
Rebased and pushed. Ready for some reviews |
I was thinking about rebasing this earlier today, so thanks for taking care of it. I've gone through the changes and they're generally what I expected. This basically inherits the logic that we already worked through in the recent I also added tests for The only other remaining change that I would like to discuss is the strategy name. This strategy checks the With the API URL in mind (i.e., If we can agree on this name change, I can go through this in the morning (EDT) to update all the instances of Past that, I tested this across existing |
I'm fine either way. If @SMillerDev agrees: go for it. |
I have no strong feelings about it so: sure, let's do it |
I was just about to push a commit with the changes (I just finished testing). A little more is required than what your commit covers, so do you mind if I overwrite it (i.e., force push) or would you prefer a follow-up commit instead? Edit: I went ahead and created a follow-up commit with the remaining changes. I also tweaked my previous commit that added tests to modify the |
`GithubLatest` was updated to use parts of `GithubRelease` and this works fine within `brew livecheck` (since we `require` all the strategies) but we need to explicitly `require` `GithubRelease` in the `GithubLatest` test file now. Without this, we encounter errors in parts of `GithubLatest` where `GithubRelease` is referenced.
The initial documentation comments contained some remaining text referring to `GithubLatest` and hadn't been updated to incorporate the recent changes to the aforementioned strategy. This also reworks some of the language to better explain the strategy's function, application, etc.
Keeping the logic for generating the API URL in a method makes it testable, aligns with other strategies, and will help to enable some future work.
The `#versions_from_content` method requires a regex and this will be enforced by the type signature, so we don't have to check for the presence of a regex when handling a `strategy` block.
This adds tests to cover all of the strategy outside of the `#find_versions` method, which we don't currently test because it involves a network request.
656547e
to
8c91280
Compare
The most recent commit updates the main Past that, one thing I forgot to mention is that we also have a Edit: As soon as I approved this, of course I noticed that I missed |
8c91280
to
9b6e8f8
Compare
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?This adds a middle way between the
git
strategy and thegithub_latest
strategy allowing us to check releases even if there are multiple release streams.