-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redact origin according to policy #77
Conversation
/cc @estark37 @jeisinger |
Looks reasonable to me, FWIW. It introduces some additional complexity to the |
<h3 id="ancestor-origin"> | ||
Determine the Ancestor Origin availble to a Location | ||
</h3> | ||
Given a <a>Location</a> <var>location</var> and a <a>browsing context</a> <var>context</var> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this use a browsing context and a location as input? Wouldn't a document or global be better and sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I prefer the factoring where the algorithm for crawling the tree belongs in HTML, and Referrer Policy only has a "censor an origin according to a referrer policy" (origin, referrer policy) -> string algorithm. The rest should be HTML's responsibility.
But addressing @jochen's concern first makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess it needs more inputs than that, since apparently it checks the browsing context being TLS-protected, and location's relevant Document's origin. That's a lot of complexity that I'm not sure is warranted, but I'm not too familiar with the details here of why you might want to apply strict-origin-when-cross-origin in scenarios where documentOrigin and locationOrigin are different.
If that complexity is desired, it's going to be really fun to write test cases for. I guess you'll grab a location object, then navigate the iframe to a different origin. It seems better to just throw a SecurityError in that case IMO, like is already done with the entry settings object check (ugh, entry settings object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annevk, at step 5 in the algorithm to produce an ancestorOrigins value (https://html.spec.whatwg.org/multipage/browsers.html#concept-location-ancestor-origins-array) a Location and an browsing context are the defined inputs, and the value is "the Unicode serialization of current's active document's origin", so I continued to use that.
@domenic, it's not enough to just have (origin, referrer policy) as inputs because the referrer policy algorithms send differential information depending on whether the target is same-origin or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hillbrad but it seems for any kind of shielding you only need a document (current's active document) as input to an algorithm that Referrer Policy might define and return a value for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what's going on better here. OK, so I would suggest taking as input (startingDocument, browsingContext) and outputting an origin. I think you should be able to get everything from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a browsing context? A document holds both the origin and referrer policy, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems simplest to have this behave as much like the rest of referrer policy as possible, including respecting the secure context boundary in a similar manner.
The existing algorithms in Referrer Policy use an environment settings object, Request's client.
The existing algorithm in HTML uses a browsing context.
It does look like the necessary pieces of information are available in a Document, as well. Although I think for consistency we would need to use Document's Location's URL, not just origin, to get proper secure context testing for blob: and filesystem: URLs.
Otherwise I honestly don't know what the reasons are to prefer Document vs. browsing context vs. environment settings object since the same relevant input data seems to be reachable from each.
So far, we tried to push back against using the referrer policy for things other than the referrer header. Can you explain why we should deviate from this here? |
<ol> | ||
<li> | ||
If <var>context</var> is <a>TLS-protected</a> <em>and</em> | ||
<var>locationOrigin</var> is not an <a><em>a priori</em> authenticated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't apply algorithms that accept URLs to origins...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. That's copied from section 8.3 earlier in the document, will need to fix there, too. #78
@jeisinger referrer policy seems like a very logical place for this to me. Referrer policy is about controlling the default leakage of information about the current document origin or location as part of loading subresources, navigating and making network requests. location.ancestorOrigins is another place in which this information can leak as part of a subresource load, and which Mozilla has expressed concern is a privacy leak. Applying the intent expressed by a referrer policy to this other, very similar, cross-origin information leak seems logical to me. |
<li> | ||
If <var>context</var> is <a>TLS-protected</a> <em>and</em> | ||
<var>locationOrigin</var> is not an <a><em>a priori</em> authenticated | ||
URL</a>, return <code>"null"</code>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm sometimes returns strings, and sometimes returns origins; that's not good.
My feeling is that there are many ways that there can be default leakage of information about the current document origin or location, and I don't know if we can enumerate of all of them, and I think we'll be in a very confusing situation if Referrer Policy controls some but not all of them. The example I think of is HPKP violation reports. Suppose a pinned site makes a subresource request for "/foo.jpg" which triggers an HPKP violation report. The report contains the hostname of the current document; do we consider that default leakage of information about the current domain as part of loading a subresource? |
@hillbrad, thinking some more, maybe what you mean by "default" is that the site would have have to opt in to the leakage by using HPKP, whereas with |
@estark37 I'm not confident interpreting whether you're in favor or against this change when you say "we'll be in a very confusing situation if Referrer Policy controls some but not all of them". Are you saying we should very narrowly only control referrer here, because we haven't enumerated every other leakage of this sort, or that we should have one place to control all of them where author control is desirable? Assuming we wanted to be able to control cross-origin ancestorOrigins leakage, where else would we put it? Is writing another spec and creating another set of flags to set a good approach vs. doing it here? |
@estark37 @jeisinger if mention of the ancestorOrigins motivation was removed and we just provided a callable algorithm here that takes (location, origin, policy) and returns (origin or "null") according to policy, would that be less concerning? |
@hillbrad I was arguing against, on the basis that I don't know if we can enumerate all sources of leakage and I think it would be confusing to developers to control some but not all of them. However I'm conscious that many smart people (including @mikewest) disagree with that reasoning, and I also don't really have any better ideas about how to control |
I would like to wait for the discussion on the html pull request to settle, and, specifically for @bzbarsky to state whether FF would implement ancestorOrigins with that restriction. I think it's unfortunate that we'd force a site to chose between privacy protection vs attribution - you might want "origin-when-cross-origin" for outgoing links, and "never" to disable ancestorOrigins. That's IMO the main reason why we shouldn't use referrer policy for other things but the referrer. On the other hand, ancestorOrigins is very valuable for fraud prevention, so if adding this to referrer policy will be enough for Firefox to ship ancestorOrigins, I'd be willing to make this compromise. |
The right answer there would seem like treating "outgoing links" and "subframe loads" differently in terms of referrer policy and using the subframe part to do the ancestorOrigins sanitization. Note that to some extent we have this already: |
@bzbarsky that doesn't, however, answer my question whether FF would ship ancestorOrigin with this addition to the referrer policy spec? |
I would attempt to ship ancestorOrigins in Firefox if it were designed such that pages that reasonably expect that their origin doesn't leak to cross-origin subframes (not necessarily direct) in fact do not leak their origin to such subframes. I can't guarantee that it would ship, obviously; that depends on the responses to an intent to ship, the actual code review, etc... |
My own inclination is to specify the smallest possible change to existing behavior that will satisfy Mozilla's objections, to increase the likelihood that the change will be promptly and compatibly adopted by a large number of browsers and with minimal disruption to existing applications consuming this data. I expect that there are a number of sites relying on ancestorOrigins today that look like: top -> sandboxed iframe -> 3rd party iframe (ad) Where "ad" expects to be able to see (and top intends it to see) the origin of "top". This means a change to ancestorOrigins which, on encountering the first null while walking towards top, stops evaluation, or, continues to give the true embedding depth but returns "null" for ancestors after the first "null, is probably a breaking change. I still think the simplest way to satisfy the requirements of "pages that reasonably expect that their origin doesn't leak" is to infer that expectation from their default referrer policy. I say this because I think that the use cases for controlling ancestorOrigins are actually somewhat limited, and it is likely that a resource that concerned about leaking its origin is in a mode of general paranoia (like a resource using a capability security model) and would set such a policy. My imagination is failing me somewhat on use cases that require independently controlling those expectations on a load-by-load basis. @bzbarsky do you have specific use cases in mind for loading a cross-origin iframe while censoring its view of the ancestorOrigin stack, which could help elucidate the design requirements? |
I don't believe anyone is proposing such a change; it doesn't make sense to do that.
If we didn't have per-element policies, that would be true. But we do, and a page that consistently sets them on all its cross-origin stuff should have such a reasonable expectation.... So should Put another way, certainly a page with a default "no-referrer" policy and no per-element policies has such an expectation. But that's not the only way to have such an expectation.
Any time you're loading some site you don't trust in that iframe, really. As a simple concrete example, say I want to embed a Google docs spreadsheet and resulting graph in my page to let people see what happens to the graph as you adjust some parameters. But that doesn't mean I want Google knowing what site is embedding this spreadsheet, because that's none of their damn business. |
Fundamentally, I think you're coming at this from a "default should be to expose everything, but I guess we should have something to satisfy the tin-foil-hat types, and it's OK if we make that something pretty narrow" perspective. On the other hand, I'm coming at this from a "default should be to not expose anything to the pervasive track-everything-about-everyone ecosystem we have on the web right now, but I understand that there are real fraud use cases that we need to address, so I'm looking for the minimal thing we can expose that allows the maximal set of use cases to exist without exposing information" perspective. Please forgive me if my characterization of your point of view is inaccurate. |
And note that I'm sympathetic to the "let's create as little work as possible for the non-Mozilla implementors here" consideration. Heck, I'd like to create as little work as possible for myself. ;) I agree that doing anything other than considering the default referrer policy becomes really complicated, not least because navigations can happen through means other than frame "src" attributes and such navigations carry along their own referrers. This is why I haven't come up with a concrete proposal there so far. If I were designing this from the ground up, I would make exposing your origin to ancestorOrigins an explicit opt-in on your iframe elements, much like what we have with allowfullscreen. That would address the fraud use cases, I believe: people who want to expose the information would opt in for those specific iframes, and people trying to avoid fraud could assume anyone not opting in is not to be trusted. The obvious problem here is that this would involve modifying various iframes involved, so can't just be rolled out by the innermost thing in the ad chain; it needs cooperation from the things above it. So it would have made rollout a bit more complicated for the ad tech ecosystem... |
The default exposing everything is just the way it already has worked for years in every browser but Firefox. That's not my personal preference with regard to privacy; green-field my preferences would be much closer to yours @bzbarsky, but no offense taken. I just find it helpful to understand specific motivating use cases if we're going to make potentially breaking changes to existing behavior in order to Pareto optimize. What do you think frame {3} should see about its grandparent {1} in the following arrangement? {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag -> loads {3, a.com} What if {3} is from c.com? |
You mean this is shipping in IE/Edge? I thought this was Blink and maybe WebKit (due to abarth putting it in both). |
In my testing, neither IE11 nor Edge 14 support I accept that there is existing deployment here, which is why I'm not seriously proposing the explicit opt-in, because I think the inertia there would be too much.
Just to make sure I understand your example, is the noreferrer attribute on the iframe tag that is loading b.com, or that is inside b.com? Here is a conceivable concrete proposal for what ancestorOrigins could return:
This proposal is making the following assumption: If a page is being sandboxed without "allow-same-origin", it is not trusted by its loader. Therefore, the loader should not rely on it sanitizing origins and should do it itself at that boundary if it wants to. Effects of this proposal on the cases we have considered so far:
|
My own inclination is to specify the smallest possible change to existing I expect that there are a number of sites relying on ancestorOrigins today top -> sandboxed iframe -> 3rd party iframe (ad) Where "ad" expects to be able to see (and top intends it to see) the origin I still think the simplest way to satisfy the requirements of "pages that I say this because I think that the use cases for controlling @bz do you have specific use cases in mind for loading a cross-origin On Thu, Oct 20, 2016 at 6:57 AM Boris Zbarsky [email protected]
|
@hillbrad copy/paste or email reply mistake? |
Given that this is still controversial, I'd rather push this to a future iteration on the referrer policy instead of blocking further progress on the current WD. |
Apologies that I missed the progress on this thread deep in my inbox. @bzbarsky it reads like your proposal doesn't actually require consulting referrer policy states at all, which means we could close this issue and specify that algorithm directly in HTML with the definition of ancestorOrigins, and not block Referrer Policy any longer on this issue? |
My proposal is affected by referrer policy insofar as it affects what referrers things are loaded with. But yes, it doesn't obviously need any hooks from the referrer policy spec. I'm not sure why this issue was a referrer policy issue to start with, honestly, since it's not like referrer policy is what's defining ancestorOrigins... That said, I'd love someone other than me giving my proposal a once-over. ;) |
WebAppSec WG seems to agree that the proposal in #77 (comment) is reasonable, Closing this PR and will propose that in HTML. Thanks! |
Define an algorithm to redact origin to "null" in the location.ancestorOrigins array if the referrer policy for the context would not send a referrer to that location.
See: whatwg/html#1918