-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In ServiceWorker mode, the external links are opened inside the iframe #404
Comments
If we think of the extraction engine as a server that can have a rewrite engine or can do server-side parsing, then maybe it's not so awful to filter the html between extraction and posting to Service Worker. It would certainly be efficient and easy. No need to wait till the DOM is loaded:
[Code not tested - probably need to add |
I would just repeat what I wrote above, that IMHO it is perfectly legitimate to think of the extraction engine as a traditional httpd server, which has server-side filtering and transformation options. There are only two solutions to this issue: use a regex on the html between extraction and sending to the Service Worker, or add an on-click event to every hyperlink in the onload event we have defined for the iframe (for Service Worker mode). |
@kelson42 : how is it handled in other kiwix implementations? |
I've just seen https://developer.mozilla.org/en-US/docs/Web/API/Clients/openWindow , which might work to process this inside Service Worker (since it's a SW issue)? |
Not sure this API is available in a SW, as it can not access the DOM. |
It does say that function is part of the Serviceworker API, though may not be available on all browsers (Edge just has a question mark, so I supposed it would need testing). I agree that testing the URL of the Fetch event to decide if it's an external request would not be 100% safe, but it'd probably be no different if we were to use one of the other methods. EDIT: Though I could see it would be very annoying if we got it wrong, as Windows could start opening all over the place.... |
Interesting! |
Maybe this has to be fixed before any of the other development on #196? I'm not so convinced that we can use the Another possibility: we could trap the One further possibility comes to mind: trap any click on the iframe, and analyse what is being clicked. It avoids touching the DOM, but it does introduce a "layer" between the DOM and the user. To sum up, I can see three methods:
Any preference, @mossroy? (I am guessing 3!) |
Sorry, all this are only bad news and I don't see a good way to handle this for now. |
@kelson42 : could you tell us how this his handled in other kiwix clients? See #404 (comment) |
OK, thanks for the info on 3. I think that effectively rules it out. On 2, I don't think there's any danger of contents in the ZIM handling the click (or mousedown/mouseup) events of the I get that we would miss click events that have been added dynamically to contents that are not, say, Obviously, it would only be helpful if we can do it in a way that does not interfere. |
Let's let @kelson42 some time to answer : maybe there is a better approach that we did not think about. We have to keep in mind that ZIM contents can be much more "dynamic" than the ones from wikimedia (in the sense of a much heavier use of javascript, and much less static HTML) I wonder if we already have other ZIM files with content based on a Single Page Application framework (Angular, VueJs, React etc), that would do the same kind of things. |
AFAIK, Kiwix ports for Linux, Windows, iOS, macOS and Android use a custom zim:// protocol sheme to access the ZIM cobtent. Therefore, no HTML DOM transformarion is needed. This is anyway only something we do, only a bit, only in Kiwix Serve. I might be wrong, but this is like this AFAIK. |
It's still necessary to test the link in some way. We don't have a problem knowing whether the link is from the ZIM or from the Web. The problem is knowing how to trap the request when a user clicks on a Web link. We can do it by inspecting the click target (Option 2 above) but it then potentially leaves dynamic links (created by JS in the ZIM) that don't have an href, but which might load external content when clicked, like the case @mossroy points out in PhET ZIMs. We need to trap anything that takes the user to another domain and open it in a new Window, or warn the user with a confirmation prompt and then open a new Window if the user wants to open the link. Otherwise external URLs open inside the iframe, which is suboptimal. I did a quick prototype using Option 2 on the Check-for-external-links-and-warn branch. In the case of the PhET example you gave, @mossroy, it at least doesn't interfere (as you say, it doesn't see the dynamically generated link). |
@Jaifroid I don't think anything is trapped in previously cited Kiwix ports. Kiwix is simply not able to handle http:// links and then this is forwarded to the OS which has to handle it. |
OK, unfortunately for us (!), the browser can handle external links and puts them in the iframe... which in some cases causes major CORS issues and the whole application stops. It's a poor experience, hence this issue to try to find a 'clean' way (or as clean as possible) to redirect the external links to a separate brower window. |
It might be a silly idea, but maybe we could add a https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP that blocks opening other domains inside the iframe? It would avoid this poor/inconsistent experience, while letting the user open the links in a new tab/window (even if it might not be easily discovered by the user) |
Interesting idea, @mossroy . I actually did define a CSP for Kiwix JS Windows, and I can't remember why... It wasn't this specific issue, but something to do with errors that were being thrown in the UWP app which a CSP fixed. It looks like this (in
In the UWP app, probably for the same reason that @kelson42 outlines, external links "automatically" (without my adding code) open in an external window. (This CSP doesn't affect that, it's because UWP uses a protocol/domain whitelist, so I have whitelisted only |
It might be similar (or simpler?) in our case to add the https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe#attr-sandbox It would need to be tested (either an explicit CSP, or the sandbox). If the effect on the end user is that they get an access denied message with no explanation as to how to open the external site they want to open, then I don't think it would be good UX, and probably worse than the current situation. If we can somehow test for this and tell the user to open a new tab with Ctrl-Click instead, it could be a good solution, more universal but slightly less user-friendly (not necessarily, depends on the implementation) than Option 2. EDIT: This sandbox attribute looks promising:
|
This But I'm not sure it's technically possible to override the browser behavior in case of a cross-origin violation (enforced either by a CSP or the sandbox attribute). I think the browser will silently ignore the request (and log into the console). But it needs to be checked. If we don't find a better solution, a compromise might be to :
|
@mossroy That sounds like a plan! I may not have time over the next couple of weeks to advance/experiment, but will do after that. Of course in the meantime feel free to investigate further or try out a few things if you like. |
Just a bit of research on this. Due to finding a Google Analytics script running in PhET experiments (from a PhET ZIM -- see openzim/phet#127), I tried out the sandbox element in KJSW to see if it can restrict access to this kind of thing. It's not fine-grained enough: you can either allow scripts or disallow them (from any source), and for something like PhET of course we need to allow scripts to run. Although there is the possibility of imposing a CSP in the iframe element, that option is experimental and only supported by Chromium browsers. Therefore we would have to fall back to a CSP in the document. For this particular purpose (blocking scripts from disallowed domains inside the iframe -- and I realize it's not the specific issue here) a CSP has to be added to the document being loaded in the iframe, e.g. with a regular expression like this (note we can't add the CSP using DOM methods, becuase by the time the DOM is loaded it is too late: the foreign assets have also been loaded):
This works very well to block the external resources loaded by the PhET experiment: For the purposes of our specific issue here (preventing external links from opening in the iframe), we want to block the iframe from navigating to a disallowed external domain. For that, fortunately, it is enough to define a CSP in |
After discussing this today with @kelson42 and @Jaifroid , here is what we decided :
This is not an ideal solution, but it's the least bad we could find. Not trapping some external links is less bad than breaking a ZIM navigation. To cover this last case, we should add a setting to let the user disable external links trapping. This setting should be considered as a "power-user" setting, and external links trapping should be on by default |
Fixes #404 This is a refresh of the work of Jaifroid in branch https://github.com/kiwix/kiwix-js/tree/Check-for-external-links-and-warn
In relation to your comment above, last bullet point, it is possible to prevent all access to external content by the iframe, but ONLY by injecting a CSP into the incoming document, i.e. altering it. I do this in KJSW, and it works well. But more importantly, it is necessary for my current implementation of Zimit file reading. If we can be sure to convert ALL the external links, it might not be a prerequisite for Zimit file reading, but as I cannot be sure of this in my current implementation, blocking access to online font files, etc., is vital. This does not stop a link being opened in a new tab, by the way. |
Fixes #404 This is a refresh of the work of Jaifroid in branch https://github.com/kiwix/kiwix-js/tree/Check-for-external-links-and-warn
Some websites refuse to be opened inside an iframe, which leaves a blank page, and raises an error in the console like :
Refused to display 'https://www.nytimes.com/1984/01/19/us/carter-leaves-hospital.html' in a frame because it set 'X-Frame-Options' to 'sameorigin'.
The external links should be opened in a new tab (like in jQuery mode).
Using a
<base target="_blank" />
would probably work, but I suppose it would also apply to internal links, which is not what we want.I'd like to avoid parsing the DOM or using regexp on the HTML content, but don't see a better way for now...
The text was updated successfully, but these errors were encountered: