Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Proposal: Consider removing https -> https/http rewrite rules? #1327

Closed
semenko opened this issue Apr 1, 2015 · 26 comments
Closed

Proposal: Consider removing https -> https/http rewrite rules? #1327

semenko opened this issue Apr 1, 2015 · 26 comments
Assignees

Comments

@semenko
Copy link
Contributor

semenko commented Apr 1, 2015

We have a small subset of rules that rewrite HTTPS sites to other HTTPS/HTTP sites. A rough grepping shows ~63 rules like this.

$ grep -F "from=\"^https" default.rulesets  | wc -l
63

I propose removing these rules, and the code needed to handle them.

Benefits:

  • HTTPSe still uses nonzero resources & can slow down browsing on underpowered systems (e.g. Chromebooks). We can remove blocking all HTTPS requests in Chrome, which would be a great performance benefit.
  • Reduce ruleset fragility -- rewriting broken pages to fix incorrect cert names seems really fragile
  • Reduce code complexity -- We can remove some of the downgrade="1" rules and eliminate the downgrade whitelist.
  • Reduce rules that probably decrease security -- I imagine some of these downgrades are ancient & aren't needed anymore (e.g. the GoogleMaps whitelist that downgrades https://maps.googleapis.com/map ?) -- users also probably don't expect us to send https -> http

Downsides:

  • We may increase breakage on sites that aren't very HTTPS friendly, but many are broken for other reasons (mixedcontent). e.g. https://fellowsblog.kiva.org

Thoughts?

@semenko
Copy link
Contributor Author

semenko commented Apr 1, 2015

FWIW, here's a rough list of files with downgrade/https matching rules.

Aeriagames.xml
Apple.xml
Austin_Fit_Magazine.com.xml
CERN.xml
CafePress.xml
Caltech.xml
Cashback.co.uk.xml
Cheezburger.xml
Comic-Con-Intl.xml
Contextly.xml
Cru.org.xml
Dafont.com.xml
DailyDot.xml
EFF.xml
Elder_Scrolls_Online.com.xml
Epic-Systems.xml
EvoTronix.xml
Fasthosts.xml
Fourmilab.xml
GO.com.xml
Ghostery_metrics_optout.xml
GoogleMaps.xml
ImageShack.us.xml
InkFrog.com.xml
Intel.xml
Intelliworks_Chat.com.xml
Jabber.ru.xml
Kat_de_Paris.xml
Kiva.org.xml
Lenovo.xml
Levo_League.com-problematic.xml
Link-Plus-Catalog.xml
Linuxaria.com.xml
Marketwatch.com.xml
Minijob-zentrale.de.xml
MobyGames.xml
MoveOn.xml
MovieTickets.com.xml
Mutterschaftsgeld.de.xml
NYTimes.xml
National-Association-of-Theatre-Owners.xml
National_Geographic.com-problematic.xml
National_Geographic.xml
Necuhb.xml
NewsBlur.xml
Nextag-self-signed.xml
Ntop.org.xml
Oak-Ridge-National-Laboratory.xml
Pair-Networks.xml
Perfect_Audience.xml
Qz.com.xml
S-msn.com.xml
SPajIT.xml
Sandbag.xml
US-Dept-of-Veterans-Affairs.xml
University-of-Alaska-Anchorage.xml
Visa.xml
Xxxbunker.com.xml
Yandex.net.xml
Yimg.com.xml
ZeniMax-Media.xml
marktplaats.nl.xml
yWorks.com.xml

@jsha
Copy link
Member

jsha commented Apr 1, 2015

The proposal is to disable all rulesets that require downgrades, treating the sites as too broken for us to fix currently? I am in favor. @2d1, opinions?

Blockers:

  • The EFF rule rewrites one URL on our site to another in order to detect whether HTTPS Everywhere is installed and give an appropriate promo. We need to figure another way to do this - open to ideas!
  • The GoogleMaps rule needs to be updated to remove the downgrades, assuming they can be removed. Blocker to this: @StevenRoddis recently made a bunch of updates to the Google rulesets, which I rolled back for the 5.0 launch. We need to roll those forward again, along with fixes to the bugs that cause the rollback. Then we can base any GoogleMaps changes off the latest version.

I think other than that none of these rules are vital enough that we can't remove them.

@semenko
Copy link
Contributor Author

semenko commented Apr 2, 2015

Hm, @2d1 I think better user experience could lead to wider adoption, which is of value too.

Perhaps I'll implement a "lean" (opt-in) mode in Chrome, which drops the HTTPS listeners.

Here's a wild idea: What if this significantly increased the extension's security by allowing us to ultimately remove the <all_urls> permissions request from Chrome (which enables http/https/ftp/file/chrome-extension), and request just http://*/*?

This would require careful thought (we couldn't regain the ability to read https:// data without a one-time scary warning prompt from Chrome) -- but would significantly reduce the potentially dangerous capabilities of HTTPSe itself. We could never -- even if someone wanted to do so maliciously -- downgrade HTTPS to HTTP (or otherwise read/write/modify HTTPS data).

@jsha
Copy link
Member

jsha commented Apr 2, 2015

I agree that it would be nice to make a security guarantee that our extension can never rewrite content that's already on HTTPS. Note that the downgrade rules may not capture all such examples: It's possible for a rule to rewrite from HTTPS to a different HTTPS URL. @semenko could you do a quick estimate of how many such rules there are?

@2d1: I hear you about coverage vs speed. But the list of downgrade rules is quite small, relatively speaking. And I think a number of them may no longer be necessary or may be able to be modified.

Also keep in mind that since Chrome and Firefox implemented MCB, we can't actually downgrade JS or CSS, which are the most likely page-breaking subresources. So we're mostly interested in rewriting top-level navigation events where there are incorrect links to an https version that doesn't exist. @semenko, perhaps we could modify your proposal to only allow downgrades of top-level navigation? Or would that still require blocking all requests?

@diracdeltas: I'm looking at the downgrade in the Apple rule,

<rule from="^https://trailers\.apple\.com/" to="http://trailers.apple.com/" downgrade="1" />  

Do you remember writing this one? Can you tell me more about it?

@fuzzyroddis
Copy link
Contributor

Could it possible be a crossdomain.xml problem?

https://trailers.apple.com
HTTP/1.1 200 OK
Server: Apache
Content-Type: text/html; charset=UTF-8
Cache-Control: max-age=844
Expires: Thu, 02 Apr 2015 23:51:35 GMT
Date: Thu, 02 Apr 2015 23:37:31 GMT
Connection: keep-alive
Set-Cookie: edgelong=-122.4195
Set-Cookie: edgelat=37.7795
Set-Cookie: edgestate=CA
Set-Cookie: edgecity=SANFRANCISCO
Set-Cookie: edgezip=94102

http://trailers.apple.com
HTTP/1.1 200 OK
Server: Apache
Content-Type: text/html; charset=UTF-8
Cache-Control: max-age=447
Expires: Thu, 02 Apr 2015 23:45:13 GMT
Date: Thu, 02 Apr 2015 23:37:46 GMT
Connection: keep-alive
Set-Cookie: edgelong=-122.4195
Set-Cookie: edgelat=37.7795
Set-Cookie: edgestate=CA
Set-Cookie: edgecity=SANFRANCISCO
Set-Cookie: edgezip=94102

They look the same. If it is a crossdomain.xml problem see the exclusion rule here https://github.com/EFForg/https-everywhere/blob/691be3b4bcb6540258ba895a037866ab48a87cca/src/chrome/content/rules/Cloudfront.xml

@jsha
Copy link
Member

jsha commented Apr 3, 2015

@StevenRoddis I don't think it's a crossdomain problem. The HTTPS version of trailers.apple.com simply doesn't work. The question is: Would it be sufficient to simply have no rule for that subdomain? Presumably not, or the ruleset would be written that way. But I'm not sure who would have linked to the HTTPS version.

@jsha
Copy link
Member

jsha commented Apr 3, 2015

FYI, if you include sideways HTTPS rewrites, i.e. rewriting from one HTTPS URL to another, we have 140 relevant rulesets:

$ grep -l 'from="^https' *.xml
Aeriagames.xml
Apple.xml
Austin_Fit_Magazine.com.xml
BBC.xml
CafePress.xml
Caltech.xml
Cashback.co.uk.xml
Cato-Institute.xml
CERN.xml
Cheezburger.xml
Comic-Con-Intl.xml
Contextly.xml
Cru.org.xml
Dafont.com.xml
DailyDot.xml
default.rulesets
EFF.xml
Elder_Scrolls_Online.com.xml
Epic-Systems.xml
Epson.xml
EvoTronix.xml
Fasthosts.xml
Fourmilab.xml
Ghostery_metrics_optout.xml
GO.com.xml
GoogleMaps.xml
ImageShack.us.xml
InkFrog.com.xml
Intelliworks_Chat.com.xml
Intel.xml
Jabber.ru.xml
Kat_de_Paris.xml
Kiva.org.xml
Lenovo.xml
Levo_League.com-problematic.xml
Link-Plus-Catalog.xml
Linuxaria.com.xml
LiveJournal.xml
Marketwatch.com.xml
marktplaats.nl.xml
Minijob-zentrale.de.xml
MobyGames.xml
MoveOn.xml
MovieTickets.com.xml
Mutterschaftsgeld.de.xml
National-Association-of-Theatre-Owners.xml
National_Geographic.com-problematic.xml
National_Geographic.xml
Necuhb.xml
NewsBlur.xml
Nextag-self-signed.xml
Ntop.org.xml
NYTimes.xml
Oak-Ridge-National-Laboratory.xml
Pair-Networks.xml
Perfect_Audience.xml
Qz.com.xml
Sandbag.xml
S-msn.com.xml
SPajIT.xml
University-of-Alaska-Anchorage.xml
US-Dept-of-Veterans-Affairs.xml
Visa.xml
Xxxbunker.com.xml
Yandex.net.xml
Yandex.xml
Yimg.com.xml
yWorks.com.xml
ZeniMax-Media.xml

@jsha
Copy link
Member

jsha commented Apr 3, 2015

I think the next step here should be to see if we can quantify the performance impact of checking HTTPS URLs.

We could also look at short-circuiting HTTPS URLs that couldn't possibly match. To do that, at ruleset loading time we would check whether a ruleset includes downgrade or sideways URLs, and make that an attribute of the in-memory RuleSet object. When an HTTPS URL matches a ruleset without the downgrade-or-sideways attribute, we can bail on processing without evaluating any regexes.

@semenko
Copy link
Contributor Author

semenko commented Apr 4, 2015

Regardless of performance thoughts, the downgrading of sites from https->http strikes me as really strange (and not something our users are expecting).

@jsha unfortunately, earlier/short circuited rules won't help a ton. We spend much of our time now in context-switching overhead (from webRequest C -> handlers for blocking javascript -> C).

You can get details via:

  1. Go to chrome://extensions/
  2. Click: Inspect views: background page
  3. Select the "Profiles" tab and start a CPU profile -- click stop after you've browsed some sites.

The resulting profile is heavy in internal bindings (extensions::) instead of stuff we control.

For a head-to-head comparison, get a good testing machine (I use a new-ish Chromebook, which is still pretty underpowered) and try:

  1. Go to chrome://extensions/
  2. Click: Inspect views: background page
  3. In the console, enter:
chrome.webRequest.onBeforeRequest.removeListener(onBeforeRequest)     // HTTPSe is now broken
// chrome.webRequest.onBeforeRequest.hasListeners()  // Optional, should return false
chrome.webRequest.onBeforeRequest.addListener(onBeforeRequest, {urls: ["http://*/*"]}, ["blocking"]);

To revert:

chrome.webRequest.onBeforeRequest.removeListener(onBeforeRequest)
onBeforeRequest.addListener(onBeforeRequest, {urls: ["https://*/*", "http://*/*"]}, ["blocking"]);

@jsha
Copy link
Member

jsha commented Apr 6, 2015

Perhaps when we do addListener, instead of listening on all HTTPS URLs, we could listen only on the ~140 hosts for which we have rewrite rules that can apply to HTTPS?

@semenko
Copy link
Contributor Author

semenko commented Apr 6, 2015

Ah -- good call! I tried this once with just HTTP, but it has some undocumented limit (IIRC ~1000?).

I'll give it a whirl.

@semenko semenko self-assigned this Apr 30, 2015
@semenko
Copy link
Contributor Author

semenko commented Apr 30, 2015

Ok, I gave this a try with only listening to the target hosts of the https rules. Unfortunately, some wildcard usage at weird places makes this hard.

e.g. for Aeriagames.xml:

<target host="www.aeriagames.com" />
<target host="aeriagames.com" />
<target host="*.aeriastatic.com" />
....
<rule from="^https?://s\.aeriastatic\.com/"
        to="https://c.aeriastatic.com/" />

Here, we'd listen on https://www.aeriagames.com, https://aeriagames.com, https://*.aeriastatic.com/ -- works great!

However, for a few sites, we're using * in weird places.

Good examples: The VA hospitals (where we have a * in the middle of a url, like subdomain..example.com), and for Google, where use use a * as a TLD (which is really weird) like some.site.google..

I'll look into this a bit more.

@jsha
Copy link
Member

jsha commented Jul 1, 2015

I'm in favor of removing middle-wildcarding for target hosts, and fixing up the relevant rules. I'm guessing for most of them we can enumerate the labels that go in the middle?

I assume Chrome also disallows right-hand wildcarding? Those may be more of a hassle to get rid of, since we have more rules depending on them, and the number of hostnames they expand to is larger. However, at least the set of possible hostnames they can expand to is better-defined.

One non-intuitive thing about wildcarding: Left-hand wildcards can be arbitrarily deep, but right (and I think middle) wildcards can only go one label deep.

@semenko
Copy link
Contributor Author

semenko commented Jul 1, 2015

I think this has gotten better with the awesome new ruleset updates. For example, the GoogleMaps.xml ruleset (which has some rough maps.google.* wildcards) had its relevant HTTPS target rules removed (the downgrade was converted to an exclusion). https://github.com/EFForg/https-everywhere/blob/master/src/chrome/content/rules/GoogleMaps.xml

I assume Chrome also disallows right-hand wildcarding?

Yep. Per the bottom of https://developer.chrome.com/extensions/match_patterns:

'*' in the host can be followed only by a '.' or '/'

If '*' is in the host, it must be the first character

@jsha
Copy link
Member

jsha commented Feb 1, 2016

Have you taken another look at this recently? I see the downgrade whitelist has gotten a lot smaller.

@semenko
Copy link
Contributor Author

semenko commented Feb 1, 2016

Briefly -- it's definitely worth doing.

The right-sided wildcards are still an issue -- and I think we should aim to remove them. With new TLDs like .sucks or .google / .mit., right-sided wildcards seem pretty broken.

@jsha
Copy link
Member

jsha commented Feb 1, 2016

Yeah, I think it would be useful to run a script that does lookups for all the ICANN public suffixes to expand the right-side wildcards to the things they actually resolve to.

However, if that affects hostnames that have downgrades, we'll wind up with an explosion of hostnames to the point that it might break this proposal. We should check to see if e.g. the Google Maps downgrade can be removed now.

@fuzzyroddis
Copy link
Contributor

@stevenatwork
Copy link
Contributor

@jsha @fuzzyroddis (me)

Relevant Comment:

<!-- covered in exclusion rule, there should be no need to downgrade -->
    <!--rule from="^https://maps\.googleapis\.com/map(?=files/lib/map_\d+_\d+\.swf|sapi/publicapi\?file=flashapi)"
        to="http://maps.googleapis.com/map" downgrade="1" /-->

@stevenatwork
Copy link
Contributor

I might be a little tired this morning but I can't understand the logic in this downgrade https://github.com/EFForg/https-everywhere/blob/master/src/chrome/content/rules/Stack-Exchange.xml#L126

<!-- deeper domains cause certificate mismatch errors - there is also a downgrade rule below -->

Examples of subdomains that match:

  • qa.sockets.stackexchange.com (websockets and wss)
  • meta.english.stackexchange.com (https->http)
  • blog.gaming.stackexchange.com (https->http)

I'm trying to find an example of a resource that is included by a https page that causes MCB

@stevenatwork
Copy link
Contributor

@stevenatwork
Copy link
Contributor

https://github.com/stevenatwork/https-everywhere/blob/master/src/chrome/content/rules/NewsBlur.xml

<rule from="^https://popular\.global\.newsblur\.com/" to="http://popular.global.newsblur.com/" downgrade="1" /> is still required as https://homepage.newsblur.com/ links to it via //popular.global.newsblur.com however there is a mismatch on popular.global.newsblur.com due to wildcard cert.

Problem is on this line: https://github.com/samuelclay/NewsBlur/blob/b8a82544bb81f8d1b54576cadcbcb4d8e8ac4b1f/templates/social/social_page.xhtml#L124

patryk added a commit to patryk/https-everywhere that referenced this issue Apr 30, 2016
J0WI pushed a commit that referenced this issue Aug 21, 2016
…) (#6242)

* new readthedocs.io domain

* Remove HTTP/HTTPS redirects in line with #1327

* Fix for regex in readthedocs.io

* [Read the Docs] Add tests, other minor updates.

* Add tests

* Expand <securecookie> coverage

* Fix missing `$` in a <securecookie>

* Move some tests around

* Update comments

* [Read the Docs] Fix the comment right this time
@jsha
Copy link
Member

jsha commented Jun 13, 2017

Thanks to @ivysrono's awesome work in #8833, we no longer have any downgrade rules, and this is now possible. We should also remove the code that implements downgrade rules in the extension, and change the checker to outright reject (instead of consulting the downgrade whitelist).

@gloomy-ghost
Copy link
Collaborator

I am closing this as the support of downgrade rules has been removed in #10396.

@Bisaloo
Copy link
Collaborator

Bisaloo commented Jul 8, 2017

For reference, the work to remove https -> https rewrites is being continued in #10843.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants
@semenko @jsha @fuzzyroddis @Bisaloo @gloomy-ghost @stevenatwork and others