User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

ato · 2022-04-11T07:24:25Z

Describe the bug

Pywb sniffs the user-agent in RewriterWithJSProxy.ua_allows_obj_proxy to decide whether it can use object proxies for JS rewriting. This appears to have broken with Chrome 100 and Firefox 100:

$ curl -s -H'User-Agent: Chrome/99' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.location = 123;
$ curl -s -H'User-Agent: Chrome/100' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.WB_wombat_location = 123;
$ curl -s -H'User-Agent: Firefox/99' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.location = 123;
$ curl -s -H'User-Agent: Firefox/100' http://localhost:8080/test/20220411050413mp_/http://example.org/test.html | grep 123
foo.WB_wombat_location = 123;

This causes many pages in the wild to redirect to a bogus URL on load including:

https://www.climate200.com.au/
https://www.news.com.au/
https://www.abc.net.au/news/

Steps to reproduce the bug

cat >test.html <<EOF
  <script>
    let foo = {};
    foo.location = 123;
  </script>
  <h1>OK</h1>
EOF

warcit -o http://example.org/ test.html
wb-manager init test
wb-manager add test test.html.warc.gz
pywb &
google-chrome-stable http://localhost:8080/test/http://example.org/test.html

Expected output: OK
Actual output: URL Not Found http://example.org/123

Environment

Working: Linux Chrome v99.0.4844.84
Not working: Linux Chrome v100.0.4896.60
Not working: Linux Chrome v100.0.4896.75

Working: Linux Firefox 99.0
Not working: Linux Firefox Developer Edition 100.0b2

Additional Context

The user agent parsing is done by werkzeug.useragents.UserAgent and seems to be deprecated and removed in newer versions werkzeug.

The text was updated successfully, but these errors were encountered:

ikreymer · 2022-04-11T07:43:07Z

Thanks for reporting! Was considering keeping old version of werkzeug, but maybe that's not the best solution given this issue..

ato · 2022-04-11T07:58:42Z

werkzeug developers suggest using https://github.com/ua-parser/uap-python instead in pallets/werkzeug#2078

Was also thinking about just inverting the browser sniffing check so that object proxies are used by default and the non-proxy rewrites are only used on browsers known to not support them (MSIE, Chrome < 49, Firefox < 44 etc).

- don't use werkzeug, use ua_parser - default to js proxy, unless determined to be an old browser, as per #707

ikreymer · 2022-04-11T17:00:42Z

werkzeug developers suggest using https://github.com/ua-parser/uap-python instead in pallets/werkzeug#2078

Was also thinking about just inverting the browser sniffing check so that object proxies are used by default and the non-proxy rewrites are only used on browsers known to not support them (MSIE, Chrome < 49, Firefox < 44 etc).

Yep, was thinking the same thing, working on a fix.

…ency Update (2.6.6) (#708) * js rewriting: default to moden js-proxy based rewriting by default, use legacy rewriting only if browsers are older than minimum, as suggested in #707 * user-agent detection: use ua_parser for user-agent detection instead of obsolete werkzeug.useragent, which also did not support browsers >=100 * tests: additional tests for rewriting with various user-agents, defaulting to new-style rewriting for unknown browsers * dockerfile: Update Dockerfile to use py3.8 * tests: skip s3 tests dependent on commoncrawl data (for now, need better s3 tests). * bump to 2.6.6, update CHANGES

ikreymer · 2022-04-11T22:56:12Z

Fixed in the 2.6.6 release!

ato · 2022-04-12T03:50:21Z

Retested our problem sites with pywb 2.6.6 and indeed fixed. Thanks very much. :-)

ikreymer added a commit that referenced this issue Apr 11, 2022

user-agent parsing fix (fixes #707):

abbfa48

- don't use werkzeug, use ua_parser - default to js proxy, unless determined to be an old browser, as per #707

ikreymer mentioned this issue Apr 11, 2022

Page reload loop with Chrome v100+ hypothesis/viahtml#332

Closed

ikreymer mentioned this issue Apr 11, 2022

User-Agent Detection Fix + New-Style rewriting on by default + Dependency Update (2.6.6) #708

Merged

8 tasks

ato closed this as completed Apr 12, 2022

crarugal mentioned this issue Apr 14, 2022

Deploy 2.6.7, fixing handling of "&" characters in queries ukwa/ukwa-pywb#84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

ato commented Apr 11, 2022 •

edited

Loading

ikreymer commented Apr 11, 2022

ato commented Apr 11, 2022 •

edited

Loading

ikreymer commented Apr 11, 2022

ikreymer commented Apr 11, 2022

ato commented Apr 12, 2022

User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

User-Agent sniffing broken by Chrome 100 and Firefox 100 #707

Comments

ato commented Apr 11, 2022 • edited Loading

Describe the bug

Steps to reproduce the bug

Environment

Additional Context

ikreymer commented Apr 11, 2022

ato commented Apr 11, 2022 • edited Loading

ikreymer commented Apr 11, 2022

ikreymer commented Apr 11, 2022

ato commented Apr 12, 2022

ato commented Apr 11, 2022 •

edited

Loading

ato commented Apr 11, 2022 •

edited

Loading