Redact origin according to policy #77

hillbrad · 2016-10-18T00:54:09Z

Define an algorithm to redact origin to "null" in the location.ancestorOrigins array if the referrer policy for the context would not send a referrer to that location.

See: whatwg/html#1918

mikewest · 2016-10-18T08:26:49Z

/cc @estark37 @jeisinger

mikewest · 2016-10-18T08:29:00Z

Looks reasonable to me, FWIW. It introduces some additional complexity to the ancestorOrigins calculation, but if this would be enough for Firefox folks to be comfortable implementing, that complexity is completely worthwhile.

annevk · 2016-10-18T08:43:12Z

index.src.html

+  <h3 id="ancestor-origin">
+    Determine the Ancestor Origin availble to a Location
+  </h3>
+  Given a <a>Location</a> <var>location</var> and a <a>browsing context</a> <var>context</var>


Why does this use a browsing context and a location as input? Wouldn't a document or global be better and sufficient?

In general I prefer the factoring where the algorithm for crawling the tree belongs in HTML, and Referrer Policy only has a "censor an origin according to a referrer policy" (origin, referrer policy) -> string algorithm. The rest should be HTML's responsibility.

But addressing @jochen's concern first makes sense.

Hmm, I guess it needs more inputs than that, since apparently it checks the browsing context being TLS-protected, and location's relevant Document's origin. That's a lot of complexity that I'm not sure is warranted, but I'm not too familiar with the details here of why you might want to apply strict-origin-when-cross-origin in scenarios where documentOrigin and locationOrigin are different.

If that complexity is desired, it's going to be really fun to write test cases for. I guess you'll grab a location object, then navigate the iframe to a different origin. It seems better to just throw a SecurityError in that case IMO, like is already done with the entry settings object check (ugh, entry settings object).

@annevk, at step 5 in the algorithm to produce an ancestorOrigins value (https://html.spec.whatwg.org/multipage/browsers.html#concept-location-ancestor-origins-array) a Location and an browsing context are the defined inputs, and the value is "the Unicode serialization of current's active document's origin", so I continued to use that.

@domenic, it's not enough to just have (origin, referrer policy) as inputs because the referrer policy algorithms send differential information depending on whether the target is same-origin or not.

@hillbrad but it seems for any kind of shielding you only need a document (current's active document) as input to an algorithm that Referrer Policy might define and return a value for.

I think I understand what's going on better here. OK, so I would suggest taking as input (startingDocument, browsingContext) and outputting an origin. I think you should be able to get everything from that.

Why do you need a browsing context? A document holds both the origin and referrer policy, no?

It seems simplest to have this behave as much like the rest of referrer policy as possible, including respecting the secure context boundary in a similar manner.

The existing algorithms in Referrer Policy use an environment settings object, Request's client.

The existing algorithm in HTML uses a browsing context.

It does look like the necessary pieces of information are available in a Document, as well. Although I think for consistency we would need to use Document's Location's URL, not just origin, to get proper secure context testing for blob: and filesystem: URLs.

Otherwise I honestly don't know what the reasons are to prefer Document vs. browsing context vs. environment settings object since the same relevant input data seems to be reachable from each.

jeisinger · 2016-10-18T09:16:31Z

So far, we tried to push back against using the referrer policy for things other than the referrer header. Can you explain why we should deviate from this here?

domenic · 2016-10-18T16:54:30Z

index.src.html

+          <ol>
+            <li>
+              If <var>context</var> is <a>TLS-protected</a> <em>and</em>
+              <var>locationOrigin</var> is not an <a><em>a priori</em> authenticated


You can't apply algorithms that accept URLs to origins...

Good point. That's copied from section 8.3 earlier in the document, will need to fix there, too. #78

hillbrad · 2016-10-18T17:00:10Z

@jeisinger referrer policy seems like a very logical place for this to me. Referrer policy is about controlling the default leakage of information about the current document origin or location as part of loading subresources, navigating and making network requests. location.ancestorOrigins is another place in which this information can leak as part of a subresource load, and which Mozilla has expressed concern is a privacy leak. Applying the intent expressed by a referrer policy to this other, very similar, cross-origin information leak seems logical to me.

domenic · 2016-10-18T17:15:08Z

index.src.html

+            <li>
+              If <var>context</var> is <a>TLS-protected</a> <em>and</em>
+              <var>locationOrigin</var> is not an <a><em>a priori</em> authenticated
+              URL</a>, return <code>"null"</code>.


This algorithm sometimes returns strings, and sometimes returns origins; that's not good.

estark37 · 2016-10-18T17:49:18Z

referrer policy seems like a very logical place for this to me. Referrer policy is about controlling the default leakage of information about the current document origin or location as part of loading subresources, navigating and making network requests. location.ancestorOrigins is another place in which this information can leak as part of a subresource load, and which Mozilla has expressed concern is a privacy leak. Applying the intent expressed by a referrer policy to this other, very similar, cross-origin information leak seems logical to me.

My feeling is that there are many ways that there can be default leakage of information about the current document origin or location, and I don't know if we can enumerate of all of them, and I think we'll be in a very confusing situation if Referrer Policy controls some but not all of them.

The example I think of is HPKP violation reports. Suppose a pinned site makes a subresource request for "/foo.jpg" which triggers an HPKP violation report. The report contains the hostname of the current document; do we consider that default leakage of information about the current domain as part of loading a subresource?

estark37 · 2016-10-18T17:55:46Z

@hillbrad, thinking some more, maybe what you mean by "default" is that the site would have have to opt in to the leakage by using HPKP, whereas with ancestorOrigins, the site doesn't have to opt in to the leakage in any way (except by framing another site)?

hillbrad · 2016-10-18T17:58:42Z

@estark37 I'm not confident interpreting whether you're in favor or against this change when you say "we'll be in a very confusing situation if Referrer Policy controls some but not all of them". Are you saying we should very narrowly only control referrer here, because we haven't enumerated every other leakage of this sort, or that we should have one place to control all of them where author control is desirable?

Assuming we wanted to be able to control cross-origin ancestorOrigins leakage, where else would we put it? Is writing another spec and creating another set of flags to set a good approach vs. doing it here?

hillbrad · 2016-10-18T18:04:42Z

@estark37 @jeisinger if mention of the ancestorOrigins motivation was removed and we just provided a callable algorithm here that takes (location, origin, policy) and returns (origin or "null") according to policy, would that be less concerning?

estark37 · 2016-10-18T18:14:31Z

@hillbrad I was arguing against, on the basis that I don't know if we can enumerate all sources of leakage and I think it would be confusing to developers to control some but not all of them. However I'm conscious that many smart people (including @mikewest) disagree with that reasoning, and I also don't really have any better ideas about how to control ancestorOrigins leakage without introducing a whole new mechanism/spec. So I guess I'm open to this change.

hillbrad · 2016-10-18T22:13:31Z

Latest update always returns a string, makes no explicit reference to ancestorOrigins, uses a URL and the potentially trustworthy URL test. Still using a browsing context, but I am very open to changing that if @annevk and @domenic agree on a more appropriate choice.

jeisinger · 2016-10-20T08:54:12Z

I would like to wait for the discussion on the html pull request to settle, and, specifically for @bzbarsky to state whether FF would implement ancestorOrigins with that restriction.

I think it's unfortunate that we'd force a site to chose between privacy protection vs attribution - you might want "origin-when-cross-origin" for outgoing links, and "never" to disable ancestorOrigins. That's IMO the main reason why we shouldn't use referrer policy for other things but the referrer.

On the other hand, ancestorOrigins is very valuable for fraud prevention, so if adding this to referrer policy will be enough for Firefox to ship ancestorOrigins, I'd be willing to make this compromise.

bzbarsky · 2016-10-20T13:29:27Z

you might want "origin-when-cross-origin" for outgoing links, and "never" to disable ancestorOrigins.

The right answer there would seem like treating "outgoing links" and "subframe loads" differently in terms of referrer policy and using the subframe part to do the ancestorOrigins sanitization.

Note that to some extent we have this already: <iframe referrerpolicy="no-referrer">. Of course then the ancestorOrigins spec would need to consider the actual referrer policy used for the specific frame load, not just the document-wide policy....

jeisinger · 2016-10-20T13:53:43Z

@bzbarsky that doesn't, however, answer my question whether FF would ship ancestorOrigin with this addition to the referrer policy spec?

bzbarsky · 2016-10-20T13:57:18Z

I would attempt to ship ancestorOrigins in Firefox if it were designed such that pages that reasonably expect that their origin doesn't leak to cross-origin subframes (not necessarily direct) in fact do not leak their origin to such subframes.

I can't guarantee that it would ship, obviously; that depends on the responses to an intent to ship, the actual code review, etc...

hillbrad · 2016-10-21T15:50:24Z

My own inclination is to specify the smallest possible change to existing behavior that will satisfy Mozilla's objections, to increase the likelihood that the change will be promptly and compatibly adopted by a large number of browsers and with minimal disruption to existing applications consuming this data.

I expect that there are a number of sites relying on ancestorOrigins today that look like:

top -> sandboxed iframe -> 3rd party iframe (ad)

Where "ad" expects to be able to see (and top intends it to see) the origin of "top". This means a change to ancestorOrigins which, on encountering the first null while walking towards top, stops evaluation, or, continues to give the true embedding depth but returns "null" for ancestors after the first "null, is probably a breaking change.

I still think the simplest way to satisfy the requirements of "pages that reasonably expect that their origin doesn't leak" is to infer that expectation from their default referrer policy.

I say this because I think that the use cases for controlling ancestorOrigins are actually somewhat limited, and it is likely that a resource that concerned about leaking its origin is in a mode of general paranoia (like a resource using a capability security model) and would set such a policy. My imagination is failing me somewhat on use cases that require independently controlling those expectations on a load-by-load basis.

@bzbarsky do you have specific use cases in mind for loading a cross-origin iframe while censoring its view of the ancestorOrigin stack, which could help elucidate the design requirements?

bzbarsky · 2016-10-21T15:58:39Z

This means a change to ancestorOrigins which, on encountering the first null while walking towards top, stops evaluation, or, continues to give the true embedding depth but returns "null" for ancestors after the first "null, is probably a breaking change.

I don't believe anyone is proposing such a change; it doesn't make sense to do that.

I still think the simplest way to satisfy the requirements of "pages that reasonably expect that their origin doesn't leak" is to infer that expectation from their default referrer policy.

If we didn't have per-element policies, that would be true. But we do, and a page that consistently sets them on all its cross-origin stuff should have such a reasonable expectation.... So should
a page with a looser default policy that only loads same-origin things which themselves have "no-referrer" policies...

Put another way, certainly a page with a default "no-referrer" policy and no per-element policies has such an expectation. But that's not the only way to have such an expectation.

do you have specific use cases in mind for loading a cross-origin iframe while censoring its view of the ancestorOrigin stack

Any time you're loading some site you don't trust in that iframe, really.

As a simple concrete example, say I want to embed a Google docs spreadsheet and resulting graph in my page to let people see what happens to the graph as you adjust some parameters. But that doesn't mean I want Google knowing what site is embedding this spreadsheet, because that's none of their damn business.

bzbarsky · 2016-10-21T16:01:59Z

Fundamentally, I think you're coming at this from a "default should be to expose everything, but I guess we should have something to satisfy the tin-foil-hat types, and it's OK if we make that something pretty narrow" perspective. On the other hand, I'm coming at this from a "default should be to not expose anything to the pervasive track-everything-about-everyone ecosystem we have on the web right now, but I understand that there are real fraud use cases that we need to address, so I'm looking for the minimal thing we can expose that allows the maximal set of use cases to exist without exposing information" perspective.

Please forgive me if my characterization of your point of view is inaccurate.

bzbarsky · 2016-10-21T16:12:15Z

And note that I'm sympathetic to the "let's create as little work as possible for the non-Mozilla implementors here" consideration. Heck, I'd like to create as little work as possible for myself. ;)

I agree that doing anything other than considering the default referrer policy becomes really complicated, not least because navigations can happen through means other than frame "src" attributes and such navigations carry along their own referrers. This is why I haven't come up with a concrete proposal there so far.

If I were designing this from the ground up, I would make exposing your origin to ancestorOrigins an explicit opt-in on your iframe elements, much like what we have with allowfullscreen. That would address the fraud use cases, I believe: people who want to expose the information would opt in for those specific iframes, and people trying to avoid fraud could assume anyone not opting in is not to be trusted. The obvious problem here is that this would involve modifying various iframes involved, so can't just be rolled out by the innermost thing in the ad chain; it needs cooperation from the things above it. So it would have made rollout a bit more complicated for the ad tech ecosystem...

hillbrad · 2016-10-21T16:15:39Z

The default exposing everything is just the way it already has worked for years in every browser but Firefox. That's not my personal preference with regard to privacy; green-field my preferences would be much closer to yours @bzbarsky, but no offense taken. I just find it helpful to understand specific motivating use cases if we're going to make potentially breaking changes to existing behavior in order to Pareto optimize.

What do you think frame {3} should see about its grandparent {1} in the following arrangement?

{1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag -> loads {3, a.com}

What if {3} is from c.com?

annevk · 2016-10-21T16:23:36Z

The default exposing everything is just the way it already has worked for years in every browser but Firefox.

You mean this is shipping in IE/Edge? I thought this was Blink and maybe WebKit (due to abarth putting it in both).

bzbarsky · 2016-10-21T17:23:01Z

in every browser but Firefox

In my testing, neither IE11 nor Edge 14 support ancestorOrigins. Am I missing something? As far as I can tell, we're really talking about a "webkit-only" feature (in quotes, because I'm not sure whether it predates Blink forking or not; the upshot is the same).

I accept that there is existing deployment here, which is why I'm not seriously proposing the explicit opt-in, because I think the inertia there would be too much.

What do you think frame {3} should see about its grandparent {1} in the following arrangement?

Just to make sure I understand your example, is the noreferrer attribute on the iframe tag that is loading b.com, or that is inside b.com?

Here is a conceivable concrete proposal for what ancestorOrigins could return:

Make a list of (origin-of-parent, referrer-it-was-loaded-with) pairs for yourself and your ancestors, excluding the topmost page. Note that documents already have a concept of referrer-they-were-loaded-with associated with them.
For every entry in the list, if "origin-of-parent" doesn't match origin of "referrer-it-was-loaded-with" (which would be true by default any time the referrer is not present; need to wordsmith this a bit to make it clear), replace the "origin-of-parent" part in that pair, and any pairs going up the tree until a different "origin-of-parent" is found with "null".

This proposal is making the following assumption: If a page is being sandboxed without "allow-same-origin", it is not trusted by its loader. Therefore, the loader should not rely on it sanitizing origins and should do it itself at that boundary if it wants to.

Effects of this proposal on the cases we have considered so far:

top -> sandboxed iframe -> 3rd party iframe (ad), assuming sandbox is without allow-same-origin: if no referrer controls were used, ad sees unsanitized origins. If sandboxed subframe load is no-referrer, or has a referrer which is not same-origin with top, then top would be sanitized out. The origin of the sandboxed iframe itself is already "null", so there is nothing to sanitize there no matter what else is going on. There is a failure mode here that I'm not sure what to do with: if top loads sandboxed iframe from top's origin, but sandboxed without allow-same-origin, and then the sandboxed iframe navigates itself, then its referrer origin would match origin-of-parent so suddenly the parent's origin would be exposed even if it loaded the sandboxed iframe no-referrer. I'm not sure what to do about that. But in practice, if the sandboxed iframe is malicious, it can already phone out its own location, and since the leakage only happens because this location's origin matches the top page's origin, we're not leaking any new information, somehow.
{1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag that's loading b.com -> loads {3, a.com}: {3} sees "null" for the grandparent origin, "b.com" for the parent origin. Same if the innermost load is c.com.
{1, a.com} default referrer policy -> loads {2, b.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, a.com}: {3} sees "a.com" for the grandparent origin, "null" for the parent origin. Same if the innermost load is c.com.
{1, a.com} default referrer policy -> loads {2, a.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, c.com}: {3} sees "null" for the parent and grandparent origins; {2} sees an unsanitized parent origin.
{1, a.com} default referrer policy -> loads {2, a.com} with noreferrer attribute on the iframe tag inside {2} -> loads {3, a.com}: {3} sees "null" for the parent and grandparent origins; {2} sees an unsanitized parent origin.
{1, a.com} default referrer policy -> loads {2, b.com} with default referrer policy -> loads {3, b.com} with noreferrer attribute on the iframe tag inside {3} -> loads {4, a.com}: {4} sees "null" for its parent and grandparent but "a.com" for its great-grandparent. This does not depend on whether {4} is a.com, b.com, or c.com.

hillbrad · 2016-10-25T18:14:04Z

My own inclination is to specify the smallest possible change to existing
behavior that will satisfy Mozilla's objections, to increase the likelihood
that the change will be promptly and compatibly adopted by a large number
of browsers and with minimal disruption to existing applications consuming
this data.

I expect that there are a number of sites relying on ancestorOrigins today
that look like:

top -> sandboxed iframe -> 3rd party iframe (ad)

Where "ad" expects to be able to see (and top intends it to see) the origin
of "top". This means a change to ancestorOrigins which, on encountering
the first null while walking towards top, stops evaluation, or, continues
to give the true embedding depth but returns "null" for ancestors after the
first "null, is probably a breaking change.

I still think the simplest way to satisfy the requirements of "pages that
reasonably expect that their origin doesn't leak" is to infer that
expectation from their default referrer policy.

I say this because I think that the use cases for controlling
ancestorOrigins are actually somewhat limited, and it is likely that a
resource that concerned about leaking its origin is in a mode of "general
paranoia" (like a resource using a capability security model) and would set
such a policy, and that there are not significant use cases that require
independently controlling those expectations on a load-by-load basis.

@bz do you have specific use cases in mind for loading a cross-origin
iframe while censoring its view of the ancestorOrigin stack, which could
help elucidate the design requirements?

On Thu, Oct 20, 2016 at 6:57 AM Boris Zbarsky [email protected]
wrote:

I would attempt to ship ancestorOrigins in Firefox if it were designed
such that pages that reasonably expect that their origin doesn't leak to
cross-origin subframes (not necessarily direct) in fact do not leak their
origin to such subframes.

I can't guarantee that it would ship, obviously; that depends on the
responses to an intent to ship, the actual code review, etc...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#77 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACFbcC63T55QY-_3JFFEABoOAsIZydwWks5q13NAgaJpZM4KZQ6i
.

bzbarsky · 2016-10-25T18:17:20Z

@hillbrad copy/paste or email reply mistake?

jeisinger · 2016-11-02T16:05:47Z

Given that this is still controversial, I'd rather push this to a future iteration on the referrer policy instead of blocking further progress on the current WD.

hillbrad · 2016-12-21T06:20:31Z

Apologies that I missed the progress on this thread deep in my inbox.

@bzbarsky it reads like your proposal doesn't actually require consulting referrer policy states at all, which means we could close this issue and specify that algorithm directly in HTML with the definition of ancestorOrigins, and not block Referrer Policy any longer on this issue?

bzbarsky · 2016-12-21T07:40:19Z

My proposal is affected by referrer policy insofar as it affects what referrers things are loaded with. But yes, it doesn't obviously need any hooks from the referrer policy spec. I'm not sure why this issue was a referrer policy issue to start with, honestly, since it's not like referrer policy is what's defining ancestorOrigins...

That said, I'd love someone other than me giving my proposal a once-over. ;)

hillbrad · 2016-12-21T17:20:37Z

WebAppSec WG seems to agree that the proposal in #77 (comment) is reasonable, Closing this PR and will propose that in HTML. Thanks!

hillbrad added 3 commits October 17, 2016 17:43

algorithm for determining ancestorOrigin for a Location

b3aec4d

algorithm for determining ancestorOrigin for a Location

415b7db

fix merge conflicts

70d978c

hillbrad mentioned this pull request Oct 18, 2016

redact location.ancestorOrigins according to Referrer Policy whatwg/html#1918

Open

mikewest assigned mikewest, jeisinger and estark37 and unassigned mikewest Oct 18, 2016

annevk reviewed Oct 18, 2016

View reviewed changes

domenic reviewed Oct 18, 2016

View reviewed changes

unicode serialize origins so return value is uniformly a string

8773d6e

hillbrad added 3 commits October 18, 2016 14:22

remove ancestor origins refs

780c31b

use potentially trustworthy URLs

c716dee

exterminate all tabs from document

0544f17

hillbrad closed this Dec 21, 2016

bzbarsky mentioned this pull request Mar 30, 2017

Redact ancestorOrigins using "the document's referrer" whatwg/html#2480

Open

annevk mentioned this pull request Apr 6, 2017

ancestorOrigins web-platform-tests/wpt#5402

Open

Redact origin according to policy #77

Redact origin according to policy #77

Conversation

hillbrad commented Oct 18, 2016

mikewest commented Oct 18, 2016

mikewest commented Oct 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic Oct 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeisinger commented Oct 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hillbrad commented Oct 18, 2016

Choose a reason for hiding this comment

estark37 commented Oct 18, 2016

estark37 commented Oct 18, 2016

hillbrad commented Oct 18, 2016

hillbrad commented Oct 18, 2016

estark37 commented Oct 18, 2016

hillbrad commented Oct 18, 2016

jeisinger commented Oct 20, 2016

bzbarsky commented Oct 20, 2016

jeisinger commented Oct 20, 2016

bzbarsky commented Oct 20, 2016

hillbrad commented Oct 21, 2016

bzbarsky commented Oct 21, 2016

bzbarsky commented Oct 21, 2016

bzbarsky commented Oct 21, 2016

hillbrad commented Oct 21, 2016 • edited Loading

annevk commented Oct 21, 2016

bzbarsky commented Oct 21, 2016

hillbrad commented Oct 25, 2016

bzbarsky commented Oct 25, 2016

jeisinger commented Nov 2, 2016

hillbrad commented Dec 21, 2016

bzbarsky commented Dec 21, 2016

hillbrad commented Dec 21, 2016

domenic Oct 18, 2016 •

edited

Loading

hillbrad commented Oct 21, 2016 •

edited

Loading