-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification needed on official specs about the nature of the ePub publications and of the ePub Reader Systems in regard to root filesystems - proposal included. #1910
Comments
@P5music, I agree that it is not said very explicitly in the documents, but the overwhelming conceptual model for an EPUB instance that it is a "Website in a Box". The overview document makes this point:
The very fact that the EPUB content is defined in terms of Web Standards makes this fairly clear, too. I would agree that we may want to emphasize this conceptual model better in the documents (cc @dauwhe @mattgarrish). However, if we accept this conceptual model then the analogy to file system structures becomes much less relevant: reading systems are to think in terms of websites with some very significant specificities and not in terms of file systems. The discussions in #1898, #1888, or #1374 are all aiming at making some of the technical details more precise. How a Reading System implements that conceptual model is not for the Standard to define. It is up to the RS implementers to decide whether it works by unzipping the content into a file system or whether it uses streaming; the behavior, in terms of, say, relative URL-s (which is where the discussion started) should be identical. What this WG has to provide are test cases to help RS-s to achieve interoperability in this respect, and not to specify implementation patterns. It is up to us (and any help is welcome from within and outside the WG!) to produce the right test cases and see whether the RS-s can implement the model based on the specification. |
@P5music, if I understand correctly your proposal boils down to:
I see two points to discuss there:
Should we make "exceeding relative URLs" non-conforming?In the #1898 proposal, these URLs are conforming, but are informatively discouraged in a note. That said, I can see how making them non-conforming could be an additional safeguard. If they are reported as EPUBCheck errors or warnings, specifically, then authors will be more forcefully discouraged to use them than with the current informative note. It should be possible to craft some normative definition of "exceeding relative URLs" and explicitly forbid that in the spec. Would that solve (part of) your concerns? Would it be sufficient to solve the problem stated in #1888?Unfortunately, not. If we don't say anything about how the root URL is defined, i.e if the RS implementors can choose whatever they want as the container root URL, then it creates interoperability / unpredictability issues. If you don't understand why, I would ask you to read carefully the problem stated in the opening comment of #1888 (especially section "The current way to obtain the URL of the Package Document is flawed", which exposes a particularly problematic solution). So, after we agree that forbidding exceeding URLs alone does not solve all of the original issue. Do you have an alternative proposal to solve it? I'm aware that #1898 is restrictive for RS implementors. There may be another way to fix the original problem in a less-restrictive manner. Ideas welcome! |
Thank you for the response. I can contribute with this: The "Website in a Box" metaphor is useful but it goes too far, it is bigger than the ePub3. I think that only the filesystem part of the website was indeed intended when the analogy was created. As I said, if exceeding URLs are forbidden that will work for both websites and filesystems, and zipped epub reader too. Websites are more complex than ePub3 publications, and it is bad for ePub publications to be compared to them. So my proposal is to forbid exceeding URLs What is the ePub3 publication? I also pondered about some further questions: -when displaying a local ePub, the browser and the WebView have an internal representation of URLS that is going to pop-up every now an then as a possible issue, they have a file path like -The spine or manifest references are parsed by the RS, but the XHTML references (images, css) should be left to the internal handling of the browser and the WebView. I hope this is useful. |
what do you mean by "drop the spooky URL problem"? |
I just mean that According to my thinking, the ePub3 publication is just a file, an archive, representing a book to be read. Regards |
I have the impression that this issue has been overtaken by events and the discussion moved elsewhere.Propose closing... |
Clarification needed on official specs about the nature of the ePub publications and of the ePub Reader Systems in regard to root filesystems.
Summary:
This issue is about some needed clarifications in the wake of the
#1898
proposal.
About:
-the nature of ePub publications and ePub readers
-the root of ePub publications
-the base Url of ePub publications
-the mount point on filesystems vs websites "root"
-the BASE tag for relative URLs and the meaning of the /file.ext syntax in references
-the need of design and source changes for some ePub readers
-the possible high-level data exchange between the WebKit module and high-level applications (like encoding images with base64 encoding and so on).
-HTML injection and base url
Please read carefully, no offence is intended to anyone.
In computer systems there is the root filesystem at the / mount point that is special.
If you run a command like
cd ../../../../../.. and so on, you will end up at the root even if you exceed the available number of levels.
For example if you are at root level
cd ../../../../../.. is going to point to the root itself.
You can mount other filesystems at some mount point.
If you mount a filesystem at a certain depth level, that point is part of the main filesystem now.
So in that case commands with exceeding number of ../ will not stop at the mount point but will continue to go up.
/home/pc/mainpath/mainfolder/secondarypath
/home/pc/mainpath/mainfolder/secondarypath/secondlevelfolder/file.xhtml
if I issue thiss command
cd /home/pc/mainpath/mainfolder/secondarypath/secondlevelfolder
I am now at secondlevelfolder of the mounted filesystem.
cd ..
I am now in secondarypath
cd ..
I am now in mainfolder, I am higher than the mount point
cd ..
I am now in mainpath
cd ../../..
I am now at root
cd ..
I am still at root.
Now
cd /home/pc/mainpath
I am at mainpath
cd ../../..
takes me to root
but also
cd ../../../..
would take me to root.
Now cd /home/pc/mainpath/mainfolder/secondarypath/secondlevelfolder
cd ../../../..
does not take me to the mount point but goes up.
So the mount point is not root, so it is not possible to count on the (strange) rule of exceeding ../ sequence for it.
Let's consider what happens in websites:
https://github.com/w3c/../P5music
is the same URL as
https://github.com/w3c/../../../../../../../../P5music
So webservers do the same at root level.
If you create a localhost on your pc, the same rule apply, the webserver will respond to external requests by applying that rule to the "root" of the localhost hosted website.
So we are talking of responding to requests, be it the request to a filesystem from a terminal, or the external request to a webserver.
You can even send a request from your browser to the localhost webserver on your pc, and that rule apply: you will not be allowed to exceed the "root" of the website the localhost is the host of.
We can now ask what is a Reader System for ePubs,
-Is it a host?
-Does it respond to external requests?
-If the client is on a device (like a tablet) and the server is on another device (like a remote computer), is the client reader a host?
Has the client to parse the URLs, or is it the server that parses the URLs and responds to the client?
-If the client unzips the ePub on the local filesystem, is it a host? Does it respond to external requests? Has it to parse the ePub URLS and apply the "exceeding ../" rule?
-Same question for readers that read from the zip archive in R.A.M. or from disk.
According to the new proposal
#1898
any reader, be it client or an application, has to parse the URLs to apply the above mentioned rule, and the ePub is like a root filesystem, an unmounted filesystem.
Is it possible to add to the specifications
What exact nature is intended for ePub publications and ePub readers, including the following cases:
-client application (reader) and server (webservice)
-client application that reads a zipped ePub in R.A.M. or from disk
-client application that unzips the ePub on the filesystem
and other cases that would happen to exist I am not aware of.
What exact nature is intended for ePubs
-are they unmountable filesystems? (are they real root filesystems)
Please take explicitly into account that the rule of exceeding ../ is naturally enforced only at root.
When a WebView requests a resource with a relative URL to the filesystem, and the URL is exceeding it has to be parsed to prevent leaking (going up the ePub "root").
The WebView can have a baseUrl, usually it is the folder where the loaded XHTMl file resides, it is the default.
Indeed if you inject HTML code without loading it from a filesystem file it is like "floating". If the XHTML file is loaded, instead, the baseUrl is automatically set to the folder where the XHTML comes from.
If the baseUrl is explicitcly provided, all realative URLs are calculated from that explicit point.
But this is not a root, so the leaking is possible. This leads to enforcing the exceeding ../ rule at code level with a major source code change.
It's like filtering exceeding URLs because the filesystem would not filter that, would not stop the exceeding ../ going up.
This is techically feasible, but also it is possible that the parse task is ignored by some readers just to not have to undergo major design changes about how the WebView reads the resources from the filesystem.
The WebView has very optimized low-level code. Usually, feeding the WebView directly with data is less performing, especially from high-level languages exchanging with a separate module as the WebView is.
Indeed when the explicit baseUrl is provided you have to feed the WebView because relative URLs are "wrong" now
Also the use of BASE tag could be considered, but the same applies. The entire publication should have relative paths to the base path, while usually the relative paths are relative to the currently displayed page origin folder, so
../images/image1.png
means "go up one level, enter the image folder, load image1.png.
If a BASE element is defined, the meaning is different. And /file.jpg is relative to the base and not the root.
That alone should be enough to reconsider the new proposal.
Being that many cases are possible, the parsing of path is really a difficult and tricky task for a reader.
Take also into account that some systems, like emulators and simulators could not even enforce the rule at root, because the root of the simulator is not the real filesystem root of the computer running the simulator.
Since the above mentioned new proposal arbitrarily makes the ePub3 publication like a "host" root and not a simply mounted filesystem, it should also be specified with all subsequent implications.
An alternate proposal could be to just to forbid exceeding relative URLs so it is good for both webservers and filesystems, that are slightly different "worlds".
Enforcing an URL syntax clearly intended for webserver host-level website-root to an intermediate point of a filesystem would need the program intervention at the "connection point" to filter what now is explicitly allowed, like exceeding relative URLs, like they are allowed at host-level website-root. It could be at least difficult or even not possible.
I could be wrong on some of the points above, so at least they should be assessed and discussed.
Regards
The text was updated successfully, but these errors were encountered: