Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

Closed
ato opened this issue Apr 11, 2022 · 5 comments
Closed

User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

ato opened this issue Apr 11, 2022 · 5 comments

Comments

@ato
Copy link
Contributor

ato commented Apr 11, 2022

Describe the bug

Pywb sniffs the user-agent in RewriterWithJSProxy.ua_allows_obj_proxy to decide whether it can use object proxies for JS rewriting. This appears to have broken with Chrome 100 and Firefox 100:

$ curl -s -H'User-Agent: Chrome/99' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.location = 123;
$ curl -s -H'User-Agent: Chrome/100' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.WB_wombat_location = 123;
$ curl -s -H'User-Agent: Firefox/99' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.location = 123;
$ curl -s -H'User-Agent: Firefox/100' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.WB_wombat_location = 123;

This causes many pages in the wild to redirect to a bogus URL on load including:

https://www.climate200.com.au/
https://www.news.com.au/
https://www.abc.net.au/news/

Steps to reproduce the bug

cat >test.html <<EOF
  <script>
    let foo = {};
    foo.location = 123;
  </script>
  <h1>OK</h1>
EOF

warcit -o http://example.org/ test.html
wb-manager init test
wb-manager add test test.html.warc.gz
pywb &
google-chrome-stable http://localhost:8080/test/http://example.org/test.html

Expected output: OK
Actual output: URL Not Found http://example.org/123

Environment

Working: Linux Chrome v99.0.4844.84
Not working: Linux Chrome v100.0.4896.60
Not working: Linux Chrome v100.0.4896.75

Working: Linux Firefox 99.0
Not working: Linux Firefox Developer Edition 100.0b2

Additional Context

The user agent parsing is done by werkzeug.useragents.UserAgent and seems to be deprecated and removed in newer versions werkzeug.

@ikreymer
Copy link
Member

Thanks for reporting! Was considering keeping old version of werkzeug, but maybe that's not the best solution given this issue..

@ato
Copy link
Contributor Author

ato commented Apr 11, 2022

werkzeug developers suggest using https://github.com/ua-parser/uap-python instead in pallets/werkzeug#2078

Was also thinking about just inverting the browser sniffing check so that object proxies are used by default and the non-proxy rewrites are only used on browsers known to not support them (MSIE, Chrome < 49, Firefox < 44 etc).

ikreymer added a commit that referenced this issue Apr 11, 2022
- don't use werkzeug, use ua_parser
- default to js proxy, unless determined to be an old browser, as per #707
@ikreymer
Copy link
Member

werkzeug developers suggest using https://github.com/ua-parser/uap-python instead in pallets/werkzeug#2078

Was also thinking about just inverting the browser sniffing check so that object proxies are used by default and the non-proxy rewrites are only used on browsers known to not support them (MSIE, Chrome < 49, Firefox < 44 etc).

Yep, was thinking the same thing, working on a fix.

ikreymer added a commit that referenced this issue Apr 11, 2022
…ency Update (2.6.6) (#708)

* js rewriting: default to moden js-proxy based rewriting by default, use legacy rewriting only if browsers are older than minimum, as suggested in #707 
* user-agent detection: use ua_parser for user-agent detection instead of obsolete werkzeug.useragent, which also did not support browsers >=100
* tests: additional tests for rewriting with various user-agents, defaulting to new-style rewriting for unknown browsers
* dockerfile: Update Dockerfile to use py3.8
* tests: skip s3 tests dependent on commoncrawl data (for now, need better s3 tests).
* bump to 2.6.6, update CHANGES
@ikreymer
Copy link
Member

Fixed in the 2.6.6 release!

@ato
Copy link
Contributor Author

ato commented Apr 12, 2022

Retested our problem sites with pywb 2.6.6 and indeed fixed. Thanks very much. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants