Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact origin according to policy #77

Closed

Conversation

hillbrad
Copy link
Contributor

Define an algorithm to redact origin to "null" in the location.ancestorOrigins array if the referrer policy for the context would not send a referrer to that location.

See: whatwg/html#1918

@mikewest
Copy link
Member

/cc @estark37 @jeisinger

@mikewest
Copy link
Member

Looks reasonable to me, FWIW. It introduces some additional complexity to the ancestorOrigins calculation, but if this would be enough for Firefox folks to be comfortable implementing, that complexity is completely worthwhile.

<h3 id="ancestor-origin">
Determine the Ancestor Origin availble to a Location
</h3>
Given a <a>Location</a> <var>location</var> and a <a>browsing context</a> <var>context</var>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use a browsing context and a location as input? Wouldn't a document or global be better and sufficient?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I prefer the factoring where the algorithm for crawling the tree belongs in HTML, and Referrer Policy only has a "censor an origin according to a referrer policy" (origin, referrer policy) -> string algorithm. The rest should be HTML's responsibility.

But addressing @jochen's concern first makes sense.

Copy link
Collaborator

@domenic domenic Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess it needs more inputs than that, since apparently it checks the browsing context being TLS-protected, and location's relevant Document's origin. That's a lot of complexity that I'm not sure is warranted, but I'm not too familiar with the details here of why you might want to apply strict-origin-when-cross-origin in scenarios where documentOrigin and locationOrigin are different.

If that complexity is desired, it's going to be really fun to write test cases for. I guess you'll grab a location object, then navigate the iframe to a different origin. It seems better to just throw a SecurityError in that case IMO, like is already done with the entry settings object check (ugh, entry settings object).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@annevk, at step 5 in the algorithm to produce an ancestorOrigins value (https://html.spec.whatwg.org/multipage/browsers.html#concept-location-ancestor-origins-array) a Location and an browsing context are the defined inputs, and the value is "the Unicode serialization of current's active document's origin", so I continued to use that.

@domenic, it's not enough to just have (origin, referrer policy) as inputs because the referrer policy algorithms send differential information depending on whether the target is same-origin or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hillbrad but it seems for any kind of shielding you only need a document (current's active document) as input to an algorithm that Referrer Policy might define and return a value for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what's going on better here. OK, so I would suggest taking as input (startingDocument, browsingContext) and outputting an origin. I think you should be able to get everything from that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a browsing context? A document holds both the origin and referrer policy, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems simplest to have this behave as much like the rest of referrer policy as possible, including respecting the secure context boundary in a similar manner.

The existing algorithms in Referrer Policy use an environment settings object, Request's client.

The existing algorithm in HTML uses a browsing context.

It does look like the necessary pieces of information are available in a Document, as well. Although I think for consistency we would need to use Document's Location's URL, not just origin, to get proper secure context testing for blob: and filesystem: URLs.

Otherwise I honestly don't know what the reasons are to prefer Document vs. browsing context vs. environment settings object since the same relevant input data seems to be reachable from each.

@jeisinger
Copy link
Member

So far, we tried to push back against using the referrer policy for things other than the referrer header. Can you explain why we should deviate from this here?

<ol>
<li>
If <var>context</var> is <a>TLS-protected</a> <em>and</em>
<var>locationOrigin</var> is not an <a><em>a priori</em> authenticated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't apply algorithms that accept URLs to origins...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That's copied from section 8.3 earlier in the document, will need to fix there, too. #78

@hillbrad
Copy link
Contributor Author

@jeisinger referrer policy seems like a very logical place for this to me. Referrer policy is about controlling the default leakage of information about the current document origin or location as part of loading subresources, navigating and making network requests. location.ancestorOrigins is another place in which this information can leak as part of a subresource load, and which Mozilla has expressed concern is a privacy leak. Applying the intent expressed by a referrer policy to this other, very similar, cross-origin information leak seems logical to me.

<li>
If <var>context</var> is <a>TLS-protected</a> <em>and</em>
<var>locationOrigin</var> is not an <a><em>a priori</em> authenticated
URL</a>, return <code>"null"</code>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm sometimes returns strings, and sometimes returns origins; that's not good.

@estark37
Copy link
Collaborator

referrer policy seems like a very logical place for this to me. Referrer policy is about controlling the default leakage of information about the current document origin or location as part of loading subresources, navigating and making network requests. location.ancestorOrigins is another place in which this information can leak as part of a subresource load, and which Mozilla has expressed concern is a privacy leak. Applying the intent expressed by a referrer policy to this other, very similar, cross-origin information leak seems logical to me.

My feeling is that there are many ways that there can be default leakage of information about the current document origin or location, and I don't know if we can enumerate of all of them, and I think we'll be in a very confusing situation if Referrer Policy controls some but not all of them.

The example I think of is HPKP violation reports. Suppose a pinned site makes a subresource request for "/foo.jpg" which triggers an HPKP violation report. The report contains the hostname of the current document; do we consider that default leakage of information about the current domain as part of loading a subresource?

@estark37
Copy link
Collaborator

@hillbrad, thinking some more, maybe what you mean by "default" is that the site would have have to opt in to the leakage by using HPKP, whereas with ancestorOrigins, the site doesn't have to opt in to the leakage in any way (except by framing another site)?

@hillbrad
Copy link
Contributor Author

@estark37 I'm not confident interpreting whether you're in favor or against this change when you say "we'll be in a very confusing situation if Referrer Policy controls some but not all of them". Are you saying we should very narrowly only control referrer here, because we haven't enumerated every other leakage of this sort, or that we should have one place to control all of them where author control is desirable?

Assuming we wanted to be able to control cross-origin ancestorOrigins leakage, where else would we put it? Is writing another spec and creating another set of flags to set a good approach vs. doing it here?

@hillbrad
Copy link
Contributor Author

@estark37 @jeisinger if mention of the ancestorOrigins motivation was removed and we just provided a callable algorithm here that takes (location, origin, policy) and returns (origin or "null") according to policy, would that be less concerning?

@estark37
Copy link
Collaborator

@hillbrad I was arguing against, on the basis that I don't know if we can enumerate all sources of leakage and I think it would be confusing to developers to control some but not all of them. However I'm conscious that many smart people (including @mikewest) disagree with that reasoning, and I also don't really have any better ideas about how to control ancestorOrigins leakage without introducing a whole new mechanism/spec. So I guess I'm open to this change.

@hillbrad
Copy link
Contributor Author

Latest update always returns a string, makes no explicit reference to ancestorOrigins, uses a URL and the potentially trustworthy URL test. Still using a browsing context, but I am very open to changing that if @annevk and @domenic agree on a more appropriate choice.

@jeisinger
Copy link
Member

I would like to wait for the discussion on the html pull request to settle, and, specifically for @bzbarsky to state whether FF would implement ancestorOrigins with that restriction.

I think it's unfortunate that we'd force a site to chose between privacy protection vs attribution - you might want "origin-when-cross-origin" for outgoing links, and "never" to disable ancestorOrigins. That's IMO the main reason why we shouldn't use referrer policy for other things but the referrer.

On the other hand, ancestorOrigins is very valuable for fraud prevention, so if adding this to referrer policy will be enough for Firefox to ship ancestorOrigins, I'd be willing to make this compromise.

@bzbarsky
Copy link

you might want "origin-when-cross-origin" for outgoing links, and "never" to disable ancestorOrigins.

The right answer there would seem like treating "outgoing links" and "subframe loads" differently in terms of referrer policy and using the subframe part to do the ancestorOrigins sanitization.

Note that to some extent we have this already: <iframe referrerpolicy="no-referrer">. Of course then the ancestorOrigins spec would need to consider the actual referrer policy used for the specific frame load, not just the document-wide policy....

@jeisinger
Copy link
Member

@bzbarsky that doesn't, however, answer my question whether FF would ship ancestorOrigin with this addition to the referrer policy spec?

@bzbarsky
Copy link

I would attempt to ship ancestorOrigins in Firefox if it were designed such that pages that reasonably expect that their origin doesn't leak to cross-origin subframes (not necessarily direct) in fact do not leak their origin to such subframes.

I can't guarantee that it would ship, obviously; that depends on the responses to an intent to ship, the actual code review, etc...

@hillbrad
Copy link
Contributor Author

My own inclination is to specify the smallest possible change to existing behavior that will satisfy Mozilla's objections, to increase the likelihood that the change will be promptly and compatibly adopted by a large number of browsers and with minimal disruption to existing applications consuming this data.

I expect that there are a number of sites relying on ancestorOrigins today that look like:

top -> sandboxed iframe -> 3rd party iframe (ad)

Where "ad" expects to be able to see (and top intends it to see) the origin of "top". This means a change to ancestorOrigins which, on encountering the first null while walking towards top, stops evaluation, or, continues to give the true embedding depth but returns "null" for ancestors after the first "null, is probably a breaking change.

I still think the simplest way to satisfy the requirements of "pages that reasonably expect that their origin doesn't leak" is to infer that expectation from their default referrer policy.

I say this because I think that the use cases for controlling ancestorOrigins are actually somewhat limited, and it is likely that a resource that concerned about leaking its origin is in a mode of general paranoia (like a resource using a capability security model) and would set such a policy. My imagination is failing me somewhat on use cases that require independently controlling those expectations on a load-by-load basis.

@bzbarsky do you have specific use cases in mind for loading a cross-origin iframe while censoring its view of the ancestorOrigin stack, which could help elucidate the design requirements?

@bzbarsky
Copy link

This means a change to ancestorOrigins which, on encountering the first null while walking towards top, stops evaluation, or, continues to give the true embedding depth but returns "null" for ancestors after the first "null, is probably a breaking change.

I don't believe anyone is proposing such a change; it doesn't make sense to do that.

I still think the simplest way to satisfy the requirements of "pages that reasonably expect that their origin doesn't leak" is to infer that expectation from their default referrer policy.

If we didn't have per-element policies, that would be true. But we do, and a page that consistently sets them on all its cross-origin stuff should have such a reasonable expectation.... So should
a page with a looser default policy that only loads same-origin things which themselves have "no-referrer" policies...

Put another way, certainly a page with a default "no-referrer" policy and no per-element policies has such an expectation. But that's not the only way to have such an expectation.

do you have specific use cases in mind for loading a cross-origin iframe while censoring its view of the ancestorOrigin stack

Any time you're loading some site you don't trust in that iframe, really.

As a simple concrete example, say I want to embed a Google docs spreadsheet and resulting graph in my page to let people see what happens to the graph as you adjust some parameters. But that doesn't mean I want Google knowing what site is embedding this spreadsheet, because that's none of their damn business.

@bzbarsky
Copy link

Fundamentally, I think you're coming at this from a "default should be to expose everything, but I guess we should have something to satisfy the tin-foil-hat types, and it's OK if we make that something pretty narrow" perspective. On the other hand, I'm coming at this from a "default should be to not expose anything to the pervasive track-everything-about-everyone ecosystem we have on the web right now, but I understand that there are real fraud use cases that we need to address, so I'm looking for the minimal thing we can expose that allows the maximal set of use cases to exist without exposing information" perspective.

Please forgive me if my characterization of your point of view is inaccurate.

@bzbarsky
Copy link

And note that I'm sympathetic to the "let's create as little work as possible for the non-Mozilla implementors here" consideration. Heck, I'd like to create as little work as possible for myself. ;)

I agree that doing anything other than considering the default referrer policy becomes really complicated, not least because navigations can happen through means other than frame "src" attributes and such navigations carry along their own referrers. This is why I haven't come up with a concrete proposal there so far.

If I were designing this from the ground up, I would make exposing your origin to ancestorOrigins an explicit opt-in on your iframe elements, much like what we have with allowfullscreen. That would address the fraud use cases, I believe: people who want to expose the information would opt in for those specific iframes, and people trying to avoid fraud could assume anyone not opting in is not to be trusted. The obvious problem here is that this would involve modifying various iframes involved, so can't just be rolled out by the innermost thing in the ad chain; it needs cooperation from the things above it. So it would have made rollout a bit more complicated for the ad tech ecosystem...

@hillbrad
Copy link
Contributor Author

hillbrad commented Oct 21, 2016

The default exposing everything is just the way it already has worked for years in every browser but Firefox. That's not my personal preference with regard to privacy; green-field my preferences would be much closer to yours @bzbarsky, but no offense taken. I just find it helpful to understand specific motivating use cases if we're going to make potentially breaking changes to existing behavior in order to Pareto optimize.

What do you think frame {3} should see about its grandparent {1} in the following arrangement?

{1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag -> loads {3, a.com}

What if {3} is from c.com?

@annevk
Copy link
Member

annevk commented Oct 21, 2016

The default exposing everything is just the way it already has worked for years in every browser but Firefox.

You mean this is shipping in IE/Edge? I thought this was Blink and maybe WebKit (due to abarth putting it in both).

@bzbarsky
Copy link

in every browser but Firefox

In my testing, neither IE11 nor Edge 14 support ancestorOrigins. Am I missing something? As far as I can tell, we're really talking about a "webkit-only" feature (in quotes, because I'm not sure whether it predates Blink forking or not; the upshot is the same).

I accept that there is existing deployment here, which is why I'm not seriously proposing the explicit opt-in, because I think the inertia there would be too much.

What do you think frame {3} should see about its grandparent {1} in the following arrangement?

Just to make sure I understand your example, is the noreferrer attribute on the iframe tag that is loading b.com, or that is inside b.com?

Here is a conceivable concrete proposal for what ancestorOrigins could return:

  1. Make a list of (origin-of-parent, referrer-it-was-loaded-with) pairs for yourself and your ancestors, excluding the topmost page. Note that documents already have a concept of referrer-they-were-loaded-with associated with them.
  2. For every entry in the list, if "origin-of-parent" doesn't match origin of "referrer-it-was-loaded-with" (which would be true by default any time the referrer is not present; need to wordsmith this a bit to make it clear), replace the "origin-of-parent" part in that pair, and any pairs going up the tree until a different "origin-of-parent" is found with "null".

This proposal is making the following assumption: If a page is being sandboxed without "allow-same-origin", it is not trusted by its loader. Therefore, the loader should not rely on it sanitizing origins and should do it itself at that boundary if it wants to.

Effects of this proposal on the cases we have considered so far:

  • top -> sandboxed iframe -> 3rd party iframe (ad), assuming sandbox is without allow-same-origin: if no referrer controls were used, ad sees unsanitized origins. If sandboxed subframe load is no-referrer, or has a referrer which is not same-origin with top, then top would be sanitized out. The origin of the sandboxed iframe itself is already "null", so there is nothing to sanitize there no matter what else is going on. There is a failure mode here that I'm not sure what to do with: if top loads sandboxed iframe from top's origin, but sandboxed without allow-same-origin, and then the sandboxed iframe navigates itself, then its referrer origin would match origin-of-parent so suddenly the parent's origin would be exposed even if it loaded the sandboxed iframe no-referrer. I'm not sure what to do about that. But in practice, if the sandboxed iframe is malicious, it can already phone out its own location, and since the leakage only happens because this location's origin matches the top page's origin, we're not leaking any new information, somehow.
  • {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag that's loading b.com -> loads {3, a.com}: {3} sees "null" for the grandparent origin, "b.com" for the parent origin. Same if the innermost load is c.com.
  • {1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, a.com}: {3} sees "a.com" for the grandparent origin, "null" for the parent origin. Same if the innermost load is c.com.
  • {1, a.com} default referrer policy -> loads {2, a.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, c.com}: {3} sees "null" for the parent and grandparent origins; {2} sees an unsanitized parent origin.
  • {1, a.com} default referrer policy -> loads {2, a.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, a.com}: {3} sees "null" for the parent and grandparent origins; {2} sees an unsanitized parent origin.
  • {1, a.com} default referrer policy -> loads {2, b.com} with default referrer policy -> loads {3, b.com} with noreferrer attribute on the iframe tag inside {3} -> loads {4, a.com}: {4} sees "null" for its parent and grandparent but "a.com" for its great-grandparent. This does not depend on whether {4} is a.com, b.com, or c.com.

@hillbrad
Copy link
Contributor Author

My own inclination is to specify the smallest possible change to existing
behavior that will satisfy Mozilla's objections, to increase the likelihood
that the change will be promptly and compatibly adopted by a large number
of browsers and with minimal disruption to existing applications consuming
this data.

I expect that there are a number of sites relying on ancestorOrigins today
that look like:

top -> sandboxed iframe -> 3rd party iframe (ad)

Where "ad" expects to be able to see (and top intends it to see) the origin
of "top". This means a change to ancestorOrigins which, on encountering
the first null while walking towards top, stops evaluation, or, continues
to give the true embedding depth but returns "null" for ancestors after the
first "null, is probably a breaking change.

I still think the simplest way to satisfy the requirements of "pages that
reasonably expect that their origin doesn't leak" is to infer that
expectation from their default referrer policy.

I say this because I think that the use cases for controlling
ancestorOrigins are actually somewhat limited, and it is likely that a
resource that concerned about leaking its origin is in a mode of "general
paranoia" (like a resource using a capability security model) and would set
such a policy, and that there are not significant use cases that require
independently controlling those expectations on a load-by-load basis.

@bz do you have specific use cases in mind for loading a cross-origin
iframe while censoring its view of the ancestorOrigin stack, which could
help elucidate the design requirements?

On Thu, Oct 20, 2016 at 6:57 AM Boris Zbarsky [email protected]
wrote:

I would attempt to ship ancestorOrigins in Firefox if it were designed
such that pages that reasonably expect that their origin doesn't leak to
cross-origin subframes (not necessarily direct) in fact do not leak their
origin to such subframes.

I can't guarantee that it would ship, obviously; that depends on the
responses to an intent to ship, the actual code review, etc...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#77 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACFbcC63T55QY-_3JFFEABoOAsIZydwWks5q13NAgaJpZM4KZQ6i
.

@bzbarsky
Copy link

@hillbrad copy/paste or email reply mistake?

@jeisinger
Copy link
Member

Given that this is still controversial, I'd rather push this to a future iteration on the referrer policy instead of blocking further progress on the current WD.

@hillbrad
Copy link
Contributor Author

Apologies that I missed the progress on this thread deep in my inbox.

@bzbarsky it reads like your proposal doesn't actually require consulting referrer policy states at all, which means we could close this issue and specify that algorithm directly in HTML with the definition of ancestorOrigins, and not block Referrer Policy any longer on this issue?

@bzbarsky
Copy link

My proposal is affected by referrer policy insofar as it affects what referrers things are loaded with. But yes, it doesn't obviously need any hooks from the referrer policy spec. I'm not sure why this issue was a referrer policy issue to start with, honestly, since it's not like referrer policy is what's defining ancestorOrigins...

That said, I'd love someone other than me giving my proposal a once-over. ;)

@hillbrad
Copy link
Contributor Author

WebAppSec WG seems to agree that the proposal in #77 (comment) is reasonable, Closing this PR and will propose that in HTML. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants