Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in v4.1.1: failed to detect absolute file path in manifest.item.href #1252

Closed
unext-wendong opened this issue May 25, 2021 · 5 comments
Labels
spec: EPUB 3.3 Impacting the support of EPUB 3.3 status: completed Work completed, can be closed type: spec The issue is related to a Specification update
Milestone

Comments

@unext-wendong
Copy link

Sample file:
sample.epub.zip

Problem description:
Since v4.1.1, for manifest items with an absolute file path as href in the ODF file, the tool fails to detect it as an error, while it does in v4.1.0 (and earlier versions, e.g. v3.0.1).

Example item description:

<item id="p-cov_h1" href="/xhtml/cov_h1.xhtml" media-type="application/xhtml+xml" />

Output from v4.1.1

Validating using EPUB version 3.0.1 rules.
No errors or warnings detected.
Messages: 0 fatal / 0 errors / 0 warnings / 0 info

EPUBCheck completed

Output from v4.1.0

Validating using EPUB version 3.0.1 rules.
ERROR(RSC-001): ./sample.epub(-1,-1): File 'OEBPS//xhtml/cov_h1.xhtml' could not be found.
ERROR(RSC-001): ./sample.epub(-1,-1): File 'OEBPS//xhtml/cov_h2.xhtml' could not be found.
WARNING(OPF-003): ./sample.epub(-1,-1): Item 'OEBPS/xhtml/cov_h1.xhtml' exists in the EPUB, but is not declared in the OPF manifest.
WARNING(OPF-003): ./sample.epub(-1,-1): Item 'OEBPS/xhtml/cov_h2.xhtml' exists in the EPUB, but is not declared in the OPF manifest.

Check finished with errors
Messages: 0 fatal / 2 errors / 2 warnings / 0 info

epubcheck completed
@unext-wendong
Copy link
Author

The directory structure in the above sample epub file is as follows:

├── META-INF
│   └── container.xml
├── OEBPS
│   ├── content.opf
│   ├── toc.xhtml
│   └── xhtml
│       ├── cov_h1.xhtml
│       └── cov_h2.xhtml
└── mimetype

The absolute file path in the problem href is /xhtml/cov_h1.xhtml which is clearly not correct.

@rdeltour
Copy link
Member

Thanks for the report @unext-wendong, that regression is definitely something to look into!

I'm not quite sure how to deal with it however.

Technically, /xhtml/cov_h1.xhtml is a relative URL (it has no scheme part). It is allowed in EPUB, and must be resolved relative to the URL of the Package Document.

That resolution is under-specified however: it depends on the URL of the Package Document, which is defined by the user agent (the reading system, or in this case, EPUBCheck).

That said, both the passing result of v4.2.x and the error message in v4.1.0 look erroneous to me. If the author uses a URL starting with /, I would assume the intent is to resolve that relative to the root of the container.

@mattgarrish what do you think?
After re-reading the specs (from 3.0.1 to the latest draft of 3.3, for the 100th time?), I believe this resolution is under-specified (results will vary between reading systems).

@rdeltour rdeltour added status: in discussion The issue is being discussed by the development team type: false-negative This issue is about invalid content being incorrectly accepted labels May 25, 2021
@mattgarrish
Copy link
Member

If the author uses a URL starting with /, I would assume the intent is to resolve that relative to the root of the container.

Ya, a path starting with a slash is a root-relative path, but I'm not sure if the root is the root of the abstract container or the location of the package document.

It seems like it's possible it could change depending on whether the content gets unzipped or not (i.e., would the root shift if only the content directory gets served).

I seem to remember this being problematic for web publications, too, because the manifest didn't have to correspond to the root of a (sub)domain and so root-relative paths could reference outside the content of the publication.

My gut instinct is we should ban root-relative paths, but it's definitely something for the WG to weigh in on.

@rdeltour
Copy link
Member

Ya, a path starting with a slash is a root-relative path, but I'm not sure if the root is the root of the abstract container or the location of the package document.

It seems like it's possible it could change depending on whether the content gets unzipped or not (i.e., would the root shift if only the content directory gets served).

In fact, my concern is not only for root-relative path, but any path really. I'm thinking we should reopen w3c/epub-specs#1374 . I'll follow up there.

@rdeltour rdeltour added type: spec The issue is related to a Specification update and removed type: false-negative This issue is about invalid content being incorrectly accepted labels May 25, 2021
@mattgarrish
Copy link
Member

Path absolute URL string are not recommended in 3.3, so I'm going to add the 3.3 label to this issue rather than start another.

See w3c/epub-specs#1681

@mattgarrish mattgarrish added the spec: EPUB 3.3 Impacting the support of EPUB 3.3 label Nov 15, 2021
@rdeltour rdeltour added this to the v5.0.0-beta-1 milestone Jan 23, 2022
@rdeltour rdeltour added status: in progress The issue is being implemented by the development team and removed status: in discussion The issue is being discussed by the development team labels Nov 16, 2022
@rdeltour rdeltour added status: completed Work completed, can be closed and removed status: in progress The issue is being implemented by the development team labels Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec: EPUB 3.3 Impacting the support of EPUB 3.3 status: completed Work completed, can be closed type: spec The issue is related to a Specification update
Projects
None yet
Development

No branches or pull requests

3 participants