Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External parsed entities in the internal DTD subset of an SVG content document #1355

Closed
murata2makoto opened this issue Oct 25, 2020 · 7 comments
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-XML The issue affects XML processing

Comments

@murata2makoto
Copy link
Contributor

murata2makoto commented Oct 25, 2020

In my understanding, EPUB 3.2 does not allow this XML document as an SVG content document.

Here desc is an external parsed entity declared in an internal DTD subset. The content of
desc.ent is "<desc></desc>", possibly preceded by the XML declaration.

Do people agree that this is not an SVG content document, as specified in EPUB 3.0, 3.0.1, and
3.2? The proposed resolution (see #1354) appears to allow this as an SVG content document. Is this intentional?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg [
<!ENTITY desc SYSTEM "desc.ent">]>
<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="100%">
    &desc;
    <g alignment-baseline="baseline"></g>
</svg>
@llemeurfr
Copy link

The solution proposed by the WG being in #1338 (comment), I agree with Makoto that the sample above is compliant with this proposal BUT causes problems to offline reading systems (especially if "desc.ent" is replaced by a full URL) and can trigger XML external entities attacks, therefore should not be compliant with the EPUB spec.

@Doktorchen
Copy link

Another example for an XHTML use case (why only SVG?).
In poetry, lyrics, recipes, bills, schedules, non-linear texts as well as in formulas (MathML, terms) one repeats content - a use case for simple entities:

<title>&t1;</title>

&t1;

&p1; &p2;

&t1;

&p2; &p2;

meins.dtd is a file in the EPUB archive with simple content like this:

Ein Absatz repräsentiert einen abgeschlossenen Gedankengang.

"> Ein Dokument kann natürlich viele Absätze enthalten.

">

Of course, 'meins.dtd' must not contain any URI/IRI references to external subsets .

Presumably such construct would expose, if some user-agent, viewer or reading systems uses an HTML5 tag soup parser instead of the required XML parser ...
Presumably the tag soup parser will run intro trouble to resolve the entities.

@Doktorchen
Copy link

Oh - the github parser seems to corrupt the example and these symbols to edit seem not to work in my browser - no accessible (unscripted) techniques available at github? ;o)

@Doktorchen
Copy link

Hopefully with entities it works better:

&lt:?xml version="1.0" encoding="UTF-8" ?&gt:
&lt:!DOCTYPE html SYSTEM "meins.dtd"&gt:
&lt:html xml:lang="de"
xmlns="http://www.w3.org/1999/xhtml"&gt:
&lt:head&gt:
&lt:title&gt:&t1;&lt:/title&gt:
&lt:/head&gt:
&lt:body&gt:
&lt:section&gt:
&lt:h1 class="&c1; &c2;"&gt:&t1;&lt:/h1&gt:
&p1;
&p2;
&lt:/section&gt:
&lt:section&gt:
&lt:h1 class="&c3; &c2;"&gt:&t1;&lt:/h1&gt:
&p2;
&p2;
&lt:/section&gt:
&lt:/body&gt:
&lt:/html&gt:

meins.dtd is a file in the EPUB archive with simple content like this:

&lt:!ENTITY t1 "Hallo Welt!"&gt:
&lt:!ENTITY p1 "&lt:p&gt:Ein Absatz repräsentiert einen abgeschlossenen Gedankengang.&lt:/p&gt:"&gt:
&lt:!ENTITY p2 "&lt:p&gt:Ein Dokument kann natürlich viele Absätze enthalten.&lt:/p&gt:"&gt:
&lt:!ENTITY c1 "meins-oben"&gt:
&lt:!ENTITY c2 "meins-beispiel"&gt:
&lt:!ENTITY c3 "meins-unten"&gt:

@murata2makoto
Copy link
Contributor Author

@Doktorchen

I'm trying to solve one issue at a time. Please start another thread for your question, since it is not about SVG.

@mattgarrish mattgarrish added Topic-ContentDocs The issue affects EPUB content documents Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation and removed Topic-ContentDocs The issue affects EPUB content documents labels Oct 30, 2020
@iherman
Copy link
Member

iherman commented Nov 6, 2020

This issue was discussed in a meeting.

  • RESOLVED: Merge PR #1368 to address outstanding DTD issues, and close GH issues 1369-1373
View the transcript Wendy Reid: we had resolutions at the F2F, and further discussions on github
… and came to a happy place
Matt Garrish: #1368
Matt Garrish: where we ended up was…
… we put in an allowance for a specific set of external identifiers that we have put in an appendix
… we have SVG and MathML that are allowed to be used in content docs or in separate files
… and we made a restriction against external entities in the internal DTD subset
… so it prevents some security issues but eases authoring
… so we’ll no longer force people to remove SVG DTDs from tool-generated files
… I’m hoping this is it :)
Ivan Herman: tech comment
… in fact, the changes are such that
… makes possible something that I’m not sure we really use
… I can define as part of an internal entity something that won’t go out to the network
… I’m not sure if this feature is in use
… formal comment
… there was a formal resolution on the previous version; this PR slightly changes that
… can we get a formal resolution to merge, and also close a bunch of issues which were examples of the problem?
Proposed resolution: Merge PR #1368 to address outstanding DTD issues, and close GH issues 1369-1373 (Wendy Reid)
Garth Conboy: +1
Matt Garrish: +1
Ivan Herman: +1
Charles LaPierre: +1
Matthew Chan: +1
Wendy Reid: +1
Brady Duga: +1
George Kerscher: +1
Laura Brady: +1
Bill Kasdorf: +1
Ben Schroeter: +1
Resolution #1: Merge PR #1368 to address outstanding DTD issues, and close GH issues 1369-1373

@iherman
Copy link
Member

iherman commented Nov 6, 2020

@wareid I believe this issue should be closed, too

@dauwhe dauwhe closed this as completed Nov 6, 2020
@mattgarrish mattgarrish added EPUB33 Issues addressed in the EPUB 3.3 revision and removed EPUB33 Issues addressed in the EPUB 3.3 revision labels Nov 9, 2020
@mattgarrish mattgarrish added the EPUB33 Issues addressed in the EPUB 3.3 revision label Sep 14, 2022
@mattgarrish mattgarrish added the Topic-XML The issue affects XML processing label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-XML The issue affects XML processing
Projects
None yet
Development

No branches or pull requests

6 participants