Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

livecheck: add user_agent #15350

Open
1 task done
razvanazamfirei opened this issue May 2, 2023 · 9 comments
Open
1 task done

livecheck: add user_agent #15350

razvanazamfirei opened this issue May 2, 2023 · 9 comments
Assignees
Labels
features New features in progress Maintainers are working on this

Comments

@razvanazamfirei
Copy link
Contributor

Verification

Provide a detailed description of the proposed feature

Allow for user_agent to be specified for livecheck curl requests. In essence, it would reuse the existing implementation from the url stanza and apply it to the livecheck url if needed.

What is the motivation for the feature?

Increasing number of casks are failing livecheck audits due to CDNs blocking the requests. This would add the flexibility to use user_agent if needed. See Homebrew/homebrew-cask#146350 Homebrew/homebrew-cask#146223 Homebrew/homebrew-cask#146330 for recent failures.

How will the feature be relevant to at least 90% of Homebrew users?

Maintainers will spend less time troubleshooting livechecks which will increase the time they spend on other user-facing projects.

What alternatives to the feature have been considered?

Not implementing it and just skipping CI when the issue comes up.

@razvanazamfirei razvanazamfirei added the features New features label May 2, 2023
@reitermarkus
Copy link
Member

I think this is likely due to #15099 still being open.

@reitermarkus
Copy link
Member

Nevermind, these are using :page_match, so doesn't seem related to the HEAD request issue.

Increasing number of casks are failing livecheck audits due to CDNs blocking the requests.

Does using a different user agent actually fix that though? Livecheck already uses two different user agents (Homebrew and fake Safari).

@samford
Copy link
Member

samford commented May 2, 2023

This is a planned feature that I already have implemented locally but it depends on the "options" feature and I need to create a PR for that first (that work is also done but there may be some discussion about the shape of it). I'm working my way towards those PRs but I'm currently working on the GitHub strategy PRs and I also need to wrap up that HeaderMatch PR that Markus mentioned.

@samford samford self-assigned this May 2, 2023
@MikeMcQuaid MikeMcQuaid added the in progress Maintainers are working on this label May 3, 2023
@daeho-ro
Copy link
Member

daeho-ro commented Apr 6, 2024

When I try to add the livecheck url for the cask hookmark, it seems that the livecheck command is working good. But the audit is failure, since the livecheck URL is not reachable by error code 403.

After digging an hour, finally noticed that the reason why those two commands are working in differ is the existence of the user-agent header.
https://github.com/Homebrew/brew/blob/master/Library/Homebrew/cask/audit.rb#L810

Is this feature development ongoing? or should I drop my changes?

@daeho-ro
Copy link
Member

daeho-ro commented Apr 6, 2024

Oh... two of the examples are hookmark and this is the problematic cask! I visited again this package haha.

@samford
Copy link
Member

samford commented Apr 6, 2024

Is this feature development ongoing? or should I drop my changes?

@daeho-ro Oddly enough, I was working on this again yesterday. I had a working implementation when I last left it but I'm reworking the changes to the livecheck block DSL before creating a PR. Depending on how I approach it, I may have to make some changes to Utils::Curl beforehand but I'm working on getting user agent support done soon (while also laying the groundwork for other needed curl-related features).

@MikeMcQuaid
Copy link
Member

Allow for user_agent to be specified for livecheck curl requests.
Livecheck already uses two different user agents (Homebrew and fake Safari).

The more I think about this: the more I think we shouldn't be changing/faking our user-agents at all, let alone let them be customised per-formula.

Ultimately our user agents being blocked is a response by the upstream project to our requests. We should respect that and do something different rather than just workaround it.

@samford
Copy link
Member

samford commented Jun 14, 2024

The more I think about this: the more I think we shouldn't be changing/faking our user-agents at all, let alone let them be customised per-formula.

Ultimately our user agents being blocked is a response by the upstream project to our requests. We should respect that and do something different rather than just workaround it.

I agree that we shouldn't use a different user agent if a server specifically blocks requests because HOMEBREW_USER_AGENT_CURL includes "Homebrew", as that's a clear signal about upstream's wishes. However, if we're simply blocked because our user agent isn't something common/typical (e.g., a browser) or shares some characteristics with bots (e.g., including "curl" text), that may not be as strong of a signal in terms of upstream's intentions.

This only affects casks at the moment, likely because commercial/closed-source software websites can be more aggressive about blocking atypical requests. We've had some livecheck blocks that work locally but fail on CI because the combination of the server's IP address and a non-browser user agent is enough to trigger server-side anti-bot measures (this can also happen locally if you use a VPN). In some of those cases, using a :browser user agent allows the check to work fine on CI.

livecheck has had logic to fall back to the :browser user agent if :default fails since mid-2021, so specifying the user agent in a livecheck block is simply about efficiency (avoiding unnecessary requests that we know will fail) and clarity. [My plan has always been to remove the automatic user agent retry logic in the Livecheck::Strategy methods once we can specify a user agent in livecheck blocks, as the former was just a stop-gap measure until the latter is available.]

In my current implementation, user agent support is simply part of being able to specify #curl_args arguments in a livecheck block, as that would allow us to handle other weird situations (though if that's too much, I could take a more limited approach). If we're against swapping user agents in general, we would have to remove existing user agent-switching logic throughout brew, remove packages as soon as they're unreachable with our user agent (there are currently 40 casks like this), and disallow new packages based on this criterion. As mentioned, some requests work locally but fail on CI, so it seems like a shame to drop working packages for that reason.

365d install counts for existing casks specifying a user_agent for the cask url are as follows:

adobe-acrobat-pro: 2479
adrive: 2943
attachecase: 89
chalk: 109
clickcharts: 37
coinomi-wallet: 53
dehelper: 57
disk-inventory-x: 3230
elgato-wave-link: 383
eudic: 1143
font-andagii: 18
font-constructium: 16
font-fairfax: 14
frhelper: 33
hookmark: 280
hopper-disassembler: 36
ithoughtsx: 85
jt-bridge: 8
jump: 228
keycue: 214
latexit: 1837
ludwig: 28
neteasemusic: 2649
outguess: 395
pages-data-merge: 55
paraview: 1854
photozoom-pro: 10
pikopixel: 52
popchar: 54
subsurface: 90
tableau: 1169
tableau-prep: 142
tableau-public: 265
tableau-reader: 52
todesk: 2029
typinator: 172
unite-phone: 5
yandex: 1483
youdaonote: 281

@MikeMcQuaid
Copy link
Member

However, if we're simply blocked because our user agent isn't something common/typical (e.g., a browser) or shares some characteristics with bots (e.g., including "curl" text), that may not be as strong of a signal in terms of upstream's intentions.

Agreed 👍🏻

remove packages as soon as they're unreachable with our user agent (there are currently 40 casks like this), and disallow new packages based on this criterion

I'd agree with this 👍🏻

As mentioned, some requests work locally but fail on CI, so it seems like a shame to drop working packages for that reason.

I could see us allowing these to remain if so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features New features in progress Maintainers are working on this
Projects
None yet
Development

No branches or pull requests

8 participants
@MikeMcQuaid @reitermarkus @samford @razvanazamfirei @daeho-ro and others