Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new section clarifying the requirements for data urls #1582

Merged
merged 5 commits into from
Mar 26, 2021

Conversation

mattgarrish
Copy link
Member

@mattgarrish mattgarrish commented Mar 22, 2021

The issue appears a little more nuanced than we've discussed so far - top-level navigation is being restricted, but you can still link to data URLs within an iframe, for example. I've tried to craft the new section to also make this distinction.

It's also possible to use data URLs in the package document along with the various content formats, so I put the new section under publication resources. It's either forbidden or pointless to use them in the manifest, but you could, for example, embed a resource in a link element.

I've also noted they are subject to the foreign resource restrictions, so that limits what you can use and where.

The one complexity is how reading systems restrict top-level navigation to data URLs. I don't know that this is something we can, or should, try to be too specific about. While it's possible you might disable hyperlinks with a data scheme, that wouldn't stop scripts from opening such content. I was reading some of the bug trackers for this and it sounds like browsers may only prevent the resource from loading in the spawned window, not restrict the event that triggered it.

As a result, I've only given a very general requirement to prevent data URLs from loading, but open to alternative wording if someone knows a better way to phrase this.

Fixes #1564


Preview | Diff

Copy link
Member

@iherman iherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot judge whether the text you have put for RS-s is enough or something more specific should be said; deferring to the RS experts on this... But, from where I stand, really +1 to what you propose!

epub33/core/index.html Show resolved Hide resolved
@@ -448,7 +448,8 @@ <h3>Terminology</h3>
in outbound hyperlinks that resolve to <a>Remote Resources</a> (e.g., referenced from the
<code>href</code> attribute of an [[!HTML]] <a
href="https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element"
><code>a</code></a> element).</p>
><code>a</code></a> element). Resources represented by <a href="#sec-data-urls">data
URLs</a> are also not Publication Resources.</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to allow data URLs as a mechanism to bypass the fallback chain when specifying non-core media types. So an img element referencing a file that is not supported would need to provide a fallback, but a data URL would not. And if the aren't Publication Resources, what are they? They aren't foreign resources, since those are defined to be publication resources. I am a little worried about introducing a new type of entity. Is it possible to leave them as publication resources, but remove the requirement to be listed in the manifest in 2.2.2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew, ya, that's not what I was intending at all. It was only about an exclusion from the manifest. I'll fix.

a <a>Top-Level Content Document</a> or open a new browser instance (e.g., they cannot be used in
manifest <code>item</code> elements that are referenced from the <a>spine</a>, in [[HTML]] or
[[SVG]] <code>a</code> elements (except when inside an <code>iframe</code>), or in calls to
[[ECMASCRIPT]] <code>window.open</code> or <code>document.open</code>).</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is exhaustive enough that it may need clarification that it is not exhaustive (in addition the "e.g." at the start). Actually might be better to introduce a new sentence here, since you have a parenthetical in the parenthetical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nearly exhaustive in that I can only see that the area element is missing from it. I'll add that case in and add a note that it's still subject to change as html, svg, etc. evolve.

[[SVG]] <code>a</code> elements (except when inside an <code>iframe</code>), or in calls to
[[ECMASCRIPT]] <code>window.open</code> or <code>document.open</code>).</p>

<p>This restriction on their use is to prevent security issues (i.e., phishing scams) and also to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to list any specific security issues here, since the terms can be vague.

@@ -172,6 +172,13 @@ <h2>EPUB Publications</h2>
href="https://www.w3.org/TR/epub-33/#sec-resource-locations">Resource Locations</a>
[[!EPUB-33]].</p>
</li>
<li>
<p id="confreq-rs-data-urls">It MUST prevent attemps at top-level navigation to data URLs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more precise term for this? Or is top-level navigation defined somewhere we can reference?

Copy link
Member Author

@mattgarrish mattgarrish Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider any top-level content document a top-level browsing context?

If so, we don't need to separate the concepts and can use only the HTML term.

I wasn't sure if they're considered equal all the time because SVG doesn't get into this level of detail. But if we consider all EPUB rendering in terms of html user agent rendering then it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a change to this effect with the latest commit. Worst case we keep working at massaging this text, I guess... 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider any top-level content document a top-level browsing context?

I worry about that, because in implementations the spine item might not be a top-level browsing context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in implementations the spine item might not be a top-level browsing context

But aren't most (all) reading systems using an html core to render content in the spine?

The problem, of course, is that we don't require this to be true, but it's probably true from a practical perspective. Maybe we could note the assumption and that an RS that doesn't use a browser core still needs to restrict access?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the following in the latest commit: "If a Reading System does not use a top-level browsing context for Top-level Content Documents, it MUST also prevent data URLs from opening as though they are Top-level Content Documents."

That's the best I can come up with since we lack any alternative name for the rendering context.

@@ -172,6 +172,13 @@ <h2>EPUB Publications</h2>
href="https://www.w3.org/TR/epub-33/#sec-resource-locations">Resource Locations</a>
[[!EPUB-33]].</p>
</li>
<li>
<p id="confreq-rs-data-urls">It MUST prevent attemps at top-level navigation to data URLs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this overly restrictive? If a book has an img element with a data url, and a reading system has a "open in new window" option for images, would that be disallowed for the data url case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw something similar to this in one of the bug trackers cited in the issue. They were going to exempt it from the open option in context menus, so no harm in doing the same here.

- reestablish data URLs as publication resources
- use top-level browsing context for defining the prohibited action
- add area to list of prohibited usage contexts
- exempt reading system UI from opening restriction
Copy link
Collaborator

@bduga bduga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

href="https://html.spec.whatwg.org/multipage/browsers.html#top-level-browsing-context"
>top-level browsing contexts</a> [[HTML]])</p>
>top-level browsing contexts</a> [[HTML]], except when initiated through the user
interface of the Reading System (e.g., a context menu).</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this language is tricksy :( Isn't clicking on something to make something happen part of the user interface of the RS? Not sure how to best phrase this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't clicking on something to make something happen part of the user interface

Ya, it took a few tries to even come up with this. I wouldn't consider clicking in the content the same as the clicking a part of the user interface, but it is awkward.

Would "affordance" improve it any: "except when initiated through a Reading System affordance such as a context menu"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this in the latest commit.

change "reading system ui" to affordances for allowing data urls to be opened
@iherman
Copy link
Member

iherman commented Mar 26, 2021

The issue was discussed in a meeting on 2021-03-26

List of resolutions:

View the transcript

2. Data URLs PR

See github pull request #1582.

See github issue #1564.

Matt Garrish: this is from what we discussed last week's call
… where do we allow data URLs, etc.
… browsers are starting to disallow navigation to data URLs (to circumvent security issues)
… so last week we said we would follow that path, but still allow embedding content via data URLs
… plus we were going to say that we wouldn't allow data URLs as spine docs
… how to phrase this was kind of complicated
… the language uses "top level browsing"
… but some RSes don't use a browser based model
… so we say that in those cases, they still have to treat the data URLs similar to the way that a browser would

Brady Duga: its hard to get the language right so that we don't disallow things by mistake, or allow things by mistake
… e.g. when an RS wants to provide the ability to let user pop out content embedded via data URL in a separate window
… wonder if there is another group working on specifying how this should work?
… or are we on the bleeding edge here? Would rather not be the ones who set this standard

Matt Garrish: this very much seems to be raised and discussed in the browser realm
… potentially a part of the HTML spec
… but nothing I can point to specifically

Ivan Herman: this PR does a lot that is not contentious, i think
… i propose to merge PR as it is, subject to the nitty gritty about the exact terminology
… but then open an issue saying that this terminology still has to be checked with TAG and HTML community
… and link to our language
… that way we specifically flag this for TAG input
… the new issue would be more focused
… and in the meantime we capture everything else in the PR

Matt Garrish: in that light, do we maybe want to keep the authoring side requirements, but leave out the RS expectations side?

Ivan Herman: pref trying to do something now, even with it being incomplete, raise a separate issue about the terminology, link that issue to the text and ask for external help, e.g., from the TAG.

Brady Duga: would refer this to TAG, yes

Ivan Herman: also, let's mark issues as explicitly "referred to TAG for input"

Dave Cramer: i'm okay with merging for now, on understanding that this is not final final

Proposed resolution: Merge PR 1582, open issue for TAG review (Wendy Reid)

Ivan Herman: +1

Wendy Reid: +1

Matt Garrish: +1

Charles LaPierre: +1

Matthew Chan: +1

Brady Duga: +`

Brady Duga: +1

Bill Kasdorf: +1

Resolution #2: Merge PR 1582, open issue for TAG review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Are data URLs always disallowed?
4 participants