Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Void the iframe between page loads #341

Closed
Jaifroid opened this issue Feb 19, 2018 · 16 comments
Closed

Void the iframe between page loads #341

Jaifroid opened this issue Feb 19, 2018 · 16 comments
Assignees
Milestone

Comments

@Jaifroid
Copy link
Member

Jaifroid commented Feb 19, 2018

After much experimentation with MathJax script loading on kiwix-js-windows (kiwix/kiwix-js-pwa#27), I've come to the conclusion that it's good practice to void the iframe between page loads. This can be done with a single line of code added to function readArticle(dirEntry):

document.getElementById("articleContent").src = "dummyArticle.html";

The advantages of this are:

  • It clears out the iframe DOM, preventing any memory leaks
  • It clears any scripts loaded in the iframe (including MathJax) and any variables attached, clearing the memory they use
  • It doesn't appear to delay page load at all

I was getting a very strange bug with MathJax whereby all equations on some mathoverflow pages would randomly get shifted down by one line, making the page illegible. But I noticed this never happened if I exited the app and went straight back to that page. The problem turned out to be the persistence of the MathJax scripts and configuration between page loads and even between different ZIM loads. As MathJax loads very fast and asynchronously anyway (much faster than decompressing a single SVG), the cleanest solution was to void the iframe. This solved the issue.

The main reasons to consider this for kiwix-js are a) it's good practice not to have bits of the DOM of an unrelated document hanging around (even between ZIM loads); and b) it will be necessary if we ever reinstate JavaScript loading or decide to backport MathJax from kiwix-js-windows.

@Jaifroid Jaifroid self-assigned this Feb 19, 2018
@mossroy mossroy added this to the v2.3 milestone Feb 19, 2018
@mossroy
Copy link
Contributor

mossroy commented Feb 19, 2018

It's probably a good idea.
I would advise to try to use "about:blank" for the iframe src attribute, instead of the dummyArticle (which I'd like to get rid of). And it's probably a bit faster.

@Jaifroid
Copy link
Member Author

Thanks @mossroy , I can confirm that about:blank works perfectly as an alternative. I've incorporated this now into kiwix-js-windows 0.9.9.1 beta. Voiding the iframe also helps with some CSS issues I was noticing when switching from Wikipedia ZIMs to Stackexchange-family ZIMs (very small fonts -- with this voiding code, the fonts show at the correct size).

@mossroy
Copy link
Contributor

mossroy commented Feb 20, 2018

Cool : could you create a PR for that?

@Jaifroid
Copy link
Member Author

Was about to post the note below when I've discovered links aren't working in Firefox! So I'll try with dummyArticle again instead....

I've made a PR and have tested quickly in Edge, but not yet in FFOS (works fine in Edge).
One observation: I tried to place this code just before the html is injected into the iframe, in displayArticleInForm() (here), but for some reason this results in the page not showing at all (it appears to load, but just a blank page is displayed). I wonder if this is because about:blank (and dummyArticle.html -- I tried with both) is loaded asynchronously, so it is not ready by the time we inject on the next line. Placing it in the readArticle function probably gives it time to load, since there is the read operation going on before the callback to displayArticleInForm. This is just a guess as to why it works well in one place and not in the other.

@Jaifroid
Copy link
Member Author

OK, I've had to change it back to dummyArticle.html. Unfortunately, while about:blank works in Edge (and in UWP app context), it doesn't work in Firefox. It works in the sense that the page loads and looks perfect (styles and images load, etc.) but for some reason links don't work.

dummyArticle.html seems to work fine in Firefox.

This may be due to the structure of about:blank? Maybe we could create some structured html programmatically if the goal is to get rid of dummyArticle. I haven't got time to investigate further just now, but I think it's worth trying as part of testing for this PR.

@Jaifroid
Copy link
Member Author

I've had five minutes to look at this, and in Firefox with src set to about:blank, links have "target=_blank" added to them, whereas in Edge they do not. This implies that the problem is with our link parsing in jQuery mode. The about:blank document must be "confusing" that routine, so the JavaScript never gets attached to internal links.

Not sure what is preferable: debug that, or simply leave dummyArticle in place. In Firefox I'm also getting (with about:blank) a warning in console "Ignoring get or set of property that has [LenientThis] because the "this" object is incorrect." on line 838 of app.js. No idea what this is!

@mossroy
Copy link
Contributor

mossroy commented Feb 20, 2018

According to https://stackoverflow.com/questions/27159967/javascript-dom-this-object-is-incorrect and https://bugzilla.mozilla.org/show_bug.cgi?id=1298830 , the console warning might be a bug in the Firefox debugger itself

Regarding the non-working links, I ran the debugger on it. It seems to come from the piece of code below :

else if (this.protocol !== currentProtocol
                    || this.host !== currentHost) {
                    // It's an external URL : we should open it in a new tab
                    $(this).attr("target", "_blank");
                }

In Firefox, the "protocol" of about:blank is "http", which is different from "file" (when testing on the filesystem). So our code wrongly considers that it's an external link.

@mossroy
Copy link
Contributor

mossroy commented Feb 20, 2018

Can you try with something like :

document.getElementById("articleContent").contentWindow.document.open();
document.getElementById("articleContent").contentWindow.document.close();

Source : https://stackoverflow.com/questions/1785040/how-to-clear-the-content-of-an-iframe

@Jaifroid
Copy link
Member Author

Thanks, some useful test results, and helpful methods to try there. I'll tinker a bit later today.

@Jaifroid
Copy link
Member Author

Some findings:

It's possible to void the iframe and load the new article without setting the iframe.src to about:blank or dummyArticle, using this code in place of the code that we use to inject the html into the iframe (in displayArticleInForm function):

var iframe = document.getElementById("articleContent");
var articleContent = iframe.contentDocument || iframe.contentWindow.document;
articleContent.open();
articleContent.write(htmlArticle);
articleContent.close();

This method has the same advantage of clearing the DOM as far as I can tell. I know this is working because when I put this into the code for Kiwix JS Windows, and load a page requiring MathJax, I can see the scripts being voided when I load a subsequent page. CSS rules seem also to be voided.

Another advantage is that this method loads the document <head> section into the iframe document's <head> section, rather than dumping it all into the iframe's <body> which our current code does (I'm sure you've seen that we've got meta statements, CSS and <title> all appearing in the iframe's <body>).

There is one disadvantage, which I haven't debugged: while links in Wikipedia work with this method, they do not work in Stackechange-family ZIMs (I've tested on mathoverflow). It seem that the code we have for computing the base URL is not working with this method, or maybe the code for injecting the computed base URL.

Going back to the dummyArticle method: While using about:blank instead of dummyArticle doesn't cause problems in the UWP app, the problem in Firefox with the links not clicking is a symptom of the fact that about:blank is loaded with "no domain". I'm a bit worried about the inconsistency this is causing in Firefox, and wonder if it is asking for trouble, even if we can work around it.

For now, I think we might be better off sticking with voiding the iframe by loading the dummyArticle in readArticle(dirEntry), and then either use our existing code for injecting the html into the body of the iframe, or use this alternative, which loads the document <head> and <body> correctly (so has most of the advantages listed above):

var iframe = document.getElementById("articleContent");
var articleContent = iframe.contentDocument || iframe.contentWindow.document;
articleContent.documentElement.innerHTML = htmlArticle;

NB, this does not on its own void the DOM (I checked), and it is necessary also to set the iframe.src to dummyArticle in readArticle(dirEntry). This code works in Edge and Firefox in both Wikipedia and Stackexchange, but I haven't checked in FFOS or Chrome.

@Jaifroid
Copy link
Member Author

Additional info: Using articleContent.write() causes the CSS not to be able to detect the page width. All Wikivoyage articles with mobile style have their headings collapsed due to a CSS rule for screens with width < 280px! Because Kiwix-JS has no way to uncollapse the headings, the page becomes unreadable.

Bizarre. But confirms my suspicion that we should use the .innerHTML method (screen width correctly detected with this method).

@Jaifroid
Copy link
Member Author

Jaifroid commented Feb 24, 2018

I think PR #342 is ready now.

I experimented with removing dummyArticle, and it is possible to delete it, remove it from the iframe src in index.html, and programmatically set iframe.src = ""; in order to void the frame between page loads. This works fine in Edge. However, we run into the same issue with Firefox's links not working due to the page not having been loaded with the same protocol as index.html. As per discussion above, I don't feel confident about the issues this causes, so I am leaving dummyArticle for now.

Note that the new code in the PR loads the article correctly into the iframe, with the <head> of htmlArticle going into the <head> of the iframe, and the <body> going into the <body>, unlike the master branch of Kiwix JS, which incorrectly places all the htmlArticle into the <body> of the iframe.

This code incorporates #343 , and writes the baseUrl directly into the <head> section of htmlArticle, before injecting htmlArticle into the iframe.

@Jaifroid
Copy link
Member Author

Jaifroid commented Apr 5, 2018

With commit kiwix/kiwix-js-pwa@232c73c in the master-dev branch of Kiwix JS Windows, I've been able to get rid of dummyArticle.html in jQuery mode (not tested in service worker) while still voiding the iframe using articleContent.open(), articleContent.write() and articleContent.close().

It works in IE11, Edge, Firefox ESR and the latest Firefox Quantum with the file:// protocol (not tested in Chrome), as well as in UWP app context. It is necessary to neutralize <script>...</script> blocks on the page at least for the newer mobile-style ZIMs, since they contain on every page a script block that uses jQuery (obviously not currently loaded, since we aren't extracting any JavaScript from the ZIM). I think this is where I was going wrong before, i.e. not realizing that the script block attempts to run when loaded via the open() write() close() method. The script is not parsed with the current innerHtml method of injecting the html into the iframe, so I did not realize the problem.

I am fairly confident this will cure #347 . Of course it needs to work also with service worker. I'll try to work on implementing this in Kiwix JS at the weekend. In the meantime feel free to try out what I've done with Kiwix JS Windows, as it runs from index.html in the three browsers I've tested it against, but only in jQuery mode (use only the master-dev branch, as master has the last stable release without these dev changes).

@Jaifroid
Copy link
Member Author

Jaifroid commented Apr 8, 2018

This summarizes the outcome of experiments in #353.

The first step is to change the use of articleContent.innerHTML which may cause the app to be rejected by Mozilla, as a browser extension, according to a warning posted here: https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML

I believe the solution is to use articleContent.write(). A consequence of using articleContent.write() is that it executes JavaScript on the page. But I see this as essential to the development of Kiwix JS for these reasons:

  • We clearly must be able to execute JavaScript if we are ever going to support, e.g., PhET ZIMs or the future ZIM types that have dynamic content that @kelson42 has mentioned in Remove dummy article #353;

  • This method gets rid of our dependency on dummyArticle, and in doing so it appears to squash completely (according to my tests) Sometimes the article is briefly displayed then replaced by a blank page #347; it also implements iframe voiding between page loads, which is required if we wish to support any kind of active content, whether loaded from scripts in the ZIM or from elsewhere.

There is one disadvantage:

  • Pages from mobile-style MediaWiki ZIMs have active JavaScript that requires jQuery, and that uses an inline onclick event on every H2 tag that is banned by the Firefox OS Content Security Policy. Moreover, on any OS, we get uncaught exceptions if attempting to toggle open or closed the headings, and an uncaught exception on page load, caused by a window.onload event that contains jQuery.

The above disadvantage is not too serious when the app is running in browser context, as for the end user it simply does nothing. However, running in app context it can cause the app to crash and exit (e.g. it causes the UWP app to crash and exit). I feel it is a bad idea to have this script failing on every page load, as we can't easily predict the effects on all systems. It should be solved by #152, but we may need to neutralize JavaScript on the page in the interim, or explore other solutions.

This was referenced Apr 8, 2018
@mossroy
Copy link
Contributor

mossroy commented Apr 8, 2018

I fully agree that executing the Javascript on the page is something we need in the mid-term. So the behavior of .write() is the one we need.
Firefox OS is not a priority : it's a convenient way to test on older browser engine (and on real devices for me). We'll see afterwards if we can find a workaround, but it's not a blocker.
The UWP crash IS a blocker. If we can't find a generic workaround in kiwix-js, it might be a workaround in kiwix-js-windows.
In principle, an uncaught javascript exception should not crash an app or a browser. It looks like a weakness in UWP apps. Does it crash on page load (because of the onload event) or on a click on a H2 tag?

@mossroy
Copy link
Contributor

mossroy commented Apr 23, 2018

Fixed by #354

@mossroy mossroy closed this as completed Apr 23, 2018
@mossroy mossroy modified the milestones: v2.4, v2.3 May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants