-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double-keyed HTTP cache #904
Comments
/cc @jkarlin for Chrome's experimental implementation. |
Safari is partitioning service workers so that a service worker of an iframe B in a top level page A will be using the partition of top level page A. |
Very interested. We'd like to mitigate security issues such as x-site search. Chrome's experimenting with double keying by (top-frame origin, url) as well as triple-keying (top-frame origin, initiator origin, url). Triple keying protects caches of frames within a page from each other. The performance hit from double-keying seems reasonable at first glance, but it's important that we address x-origin prefetch in a multi-keyed cache world. We don't yet have data on triple-keying. |
This issue should also consider the differences between partitioned http cache and partitioned origin storage. I believe webkit's default partitioning include both http and origin storage. I don't think the chrome experiment includes any origin storage partitioning, though. |
+1 It would be nice to try to factor out cache key computation to support not only this, but things like variants, etc. |
Mozilla is interested in partitioning other bits as well, but for this issue I'd like to focus solely on the HTTP cache. Some of the infrastructure we might be able to reuse for the other bits mentioned above, but I don't think there's any strong reason to couple them from the start. |
WebKit also partitions LocalStorage on eTLD+1 and used to partition cookies up until a couple of months ago (now the same cookies for third parties are just blocked instead). In the case of partitioned LocalStorage, it is also not persisted which makes into a slightly weird SessionStorage. I think eTLD+1 makes a lot of sense for partitioning unless we’re seeing (or expect to see) attacks that would be fixed with origin partitioning. However, as Youenn said, we’d be willing to harmonize with other implementers for consistency. |
I'm a proponent of origin as it's the security boundary for most aspects of the browser and easier to reason about. /cc @sleevi |
Since the cache attacks are not that involved it seems rather risky to not do origin-based as it would mean a compromise of any example.com domain could be used to attack sensitive.example.com. |
Another question is how we deal with x-origin I know of two options to make x-origin prefetch still work:
/cc @yoavweiss @kinu |
Great, let's leave the prefetch discussion in #881 then. We're still doing the work to compare performance of double vs triple keying the network stack. Sorry for taking so long. Note that we're planning on using this key for the entire network stack (memory cache, disk cache, socket pools, etc). |
In terms of a spec for the double keyed cache, would appreciate inputs on what's a good place to spec it, possibly the Fetch whatwg spec. |
@shivanigithub if you want to help with this that'd be great! My thinking here involved changes to the Fetch and HTML standards. In particular:
Hope that helps! |
@annevk I suspect we'd also need to update connection pools as well to extend that concept. Or were you thinking of doing it separately? I wasn't sure if #904 (comment) extended to those changes as well? I ask, because I'm wondering if it makes sense if, similar to how "connection" is defined as an aggregate of both
Incrementalism also works, I just wanted to make sure that was your goal. |
@sleevi I'd like changes to connections to be a separate change, but it does make sense to me to iterate toward that. Would you mind opening an issue on that and elaborate a bit on the thinking behind it there? I understood there to be an issue with sites being able to reach the global connection limit, but that would not necessarily disappear with a first-party origin key on connection pools I think. |
@annevk Thanks for the inputs. Looking into these. |
@annevk Regarding the spec inputs, am I correct in understanding that the proposed "first-party origin" field on the the environment settings object is an output field populated by the browser to indicate the key being used and not an input to the browser? |
I'm not entirely sure what you mean, but yes, the browser (i.e., user agent) is responsible for setting it. |
Few clarifications for spec changes: [Already existing text] [New text added following the above] |
I think at the very least we should define top-level origin from first principles and use that as the key as all browsers plan to align on that. Allowing additional keys seems reasonable. I also think we should be more explicit and update the various lookup points to pass in the appropriate keys. |
+1 to this. It seems that, from an infrastructure perspective, declaring that the client has multiple caches, similar to how we do for connection pools (which would presumably be extended in #917), would work. That is, a given request has an associated This would allow the rest of the infrastructure to naturally work, by conceptually stating there are multiple caches (similar to how Service Workers do with the Cache object). An implementation would be able to implement this using a single logical disk store by double-keying/triple-keying, which should be indistinguishable from the spec. |
Thanks for the inputs! Makes sense to focus on updating the whatwg fetch spec for this change and the cache issue can take care of cleaning up the relevant cache key definitions in the IETF RFC. |
Created a pull request for the spec change: #943 |
The html spec change to define top-level origin is in progress at: whatwg/html#4966 |
I wanted to loop back to the earlier discussion on origin vs eTLD+1 for partitioning the cache. We've come across sites where frames are significantly impacted by triple-keying with origin but not eTLD+1. As such, we intend to proceed with scheme+eTLD+1 instead of origin and (like site isolation) hope to transition to origin at a later point. |
This will provide the foundation for whatwg/fetch#943 and other related changes discussed in whatwg/fetch#904.
#943 is almost ready to land. Now would be a good time to speak up if you need more time to review or some such. Otherwise it'll go in tomorrow or Thursday I suspect. The initial version uses the top-level site (not origin after all, mind) as the additional HTTP cache key. |
@annevk @mozfreddyb, for efficiency concerns, implement local CDN emulation; similar to this firefox/chrome addon, probably the best example which currently covers the highest number of CDNs out there at the moment: https://codeberg.org/nobody/LocalCDN "A web browser extension that emulates Content Delivery Networks to improve your online privacy. It intercepts traffic, finds supported resources locally, and injects them into the environment." re: https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/6KKXv1PqPZ0/discussion
Obviously security is a huge concern, and I completely understand and appreciate the work being done here. But I'd want to make sure that an important performance story on the web isn't accidentally destroyed in the process. If this proposal does continue to move forward, I'd at least want an opt-in proposal discussed, either via the existing Cache-Control header, a new header, or some other mechanism. I do not believe that either of the two concerns outlined above were reasonably serious: We're talking about a small number of CDN-related cookies, and in practice the "Detect if a user has visited a specific site" attack-surface would be negligible (and again, opt-in). I'm happy to contribute / get involved if time & effort is a blocker here. Thanks again,
|
The next step would be automated local CDN emulation which is separated into its own cache using a detection mechanism that determines if the same resource has been accessed over multiple websites; similar to privacy badger, which "If an advertiser seems to be tracking you across multiple [three] websites without your permission, Privacy Badger automatically blocks that advertiser from loading any more content in your browser. " |
With [double-keyed cache](whatwg/fetch#904) enabled in all main browsers, cached resources will not be shared across webistes.
* Remove third-party caching misconception With [double-keyed cache](whatwg/fetch#904) enabled in all main browsers, cached resources will not be shared across webistes. * Update content/analytics/web-analytics/understanding-web-analytics/data-origin-and-collection.md Co-authored-by: marciocloudflare <[email protected]>
The idea here is that the "browser's address bar origin" is an additional key for its HTTP cache, to prevent certain classes of attacks.
Safari ships a variant of this (uses registrable domain, not origin), but seems willing to adjust to origin. Other implementers are interested in shipping and are at various stages of experimentation.
This will require making all accesses of "the HTTP cache" more contextual, by accessing the HTTP cache of X whereby X is some defined origin. (Other ideas welcome, @mnot?)
I'm not sure where to store the defined origin. We could do a browsing context ancestor walk and that might be okay as I think all fetches always require a fully active document, but would be nice to have that confirmed.
(I'm also assuming that auxiliary browsing contexts are not special here and behave like other top-level browsing contexts for the purposes of this.)
cc @youennf @whatwg/security
The text was updated successfully, but these errors were encountered: