Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: can't access dead object" exceptions within Firefox extension #361

Closed
mossroy opened this issue Apr 30, 2018 · 39 comments
Closed
Assignees
Labels
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Apr 30, 2018

It looks like a regression since 2.2
When running kiwix-js as a Firefox extension, I randomly get this exception when clicking on a link, leading to a "Error reading article with title xxx" instead of displaying the article.

@mossroy mossroy added the bug label Apr 30, 2018
@mossroy mossroy added this to the v2.3 milestone Apr 30, 2018
@mossroy mossroy self-assigned this Apr 30, 2018
@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

See https://developer.mozilla.org/docs/Web/JavaScript/Reference/Erreurs/Dead_object

After trying various versions of nightly builds, I see that the issue appeared after merging #354.
I suppose we're somewhere referencing a DOM object from the iframe, that has been destroyed by our articleContent.write().
I need to find which one and fix this because it looks like a blocker for a release.

@Jaifroid
Copy link
Member

When you say "randomly" is it not every link? Can you isolate which links are producing the error? It may have something to do with the event listeners? Might we have to revoke the listener in the click function?

@Jaifroid
Copy link
Member

I mean, put removeEventListener (or jQuery off) as part of the redirection here:
https://github.com/kiwix/kiwix-js/blob/master/www/js/app.js#L905
(or anywhere that a new document is loaded)? Just a guess.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

The bug happens on the Ray Charles archive for example.
Steps to reproduce :

  • on Firefox, install the latest nightly extension by clicking on the .xpi file from https://download.kiwix.org/nightly/2018-04-29/
  • open the Ray Charles archive
  • it displays the main article : click on the first link "A fool for you" (it usually works)
  • then click on the first link "Ray Charles"

@Jaifroid
Copy link
Member

Hmm, I've not been able to reproduce (yet) in Firefox Quantum on Ray Charles, or on the latest WikiMed. But I am getting a lot of CSP errors in console:

image

I wonder if it's a race condition with the old DOM elements not being fully garbage collected before starting to access the new DOM. Strange that it's random.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

The stacktrace of the exception is :
Sizzle@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/jquery-3.2.1.slim.js:819:6 find@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/jquery-3.2.1.slim.js:2922:4 goToArticle/<@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/app.js:1042:17 _fulfilled@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/q.js:794:54 Promise.prototype.then/</<@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/q.js:823:30 Promise/promise.promiseDispatch@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/q.js:756:13 become/</<@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/q.js:564:17 flush@moz-extension://e3199cf6-77e7-4308-8133-66400fd6f608/www/js/lib/q.js:110:17"

It seems to indicate it happens in a find() call of jQuery, line 1042 of app.js:
$('#articleContent').contents().find('body').html("");

Upgrading jQuery to its latest version (3.3.1) does not solve the issue.
I tried to add $(this).off("click"); where you suggested (before the goToArticle call), but it did not seem to solve it either.

@Jaifroid
Copy link
Member

Ah, that looks like the old way of clearing the html... I'll check out that line.

@Jaifroid
Copy link
Member

I think that line is a rogue accidentally left in the code. It shouldn't be needed to clear the body of the document as we're going to rewrite the whole thing. I'm trying commenting it out.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

If I comment this line, the same error happens a bit later in the code, line 873 :
$('#articleContent').contents().find('body').find('a').each(function() {

@Jaifroid
Copy link
Member

Well it's easy to change that line to a standard DOM method.
Off top of head:
document.getElementByID('articleContent').contentDocument.body.getElementsByTagName('a').forEach(function...{
In practice I wouldn't use "forEach" because it is slower than a loop, though it won't be as slow as jQuery's each. Also, I'd probably reference the iframe first, then do iframe.body.... Also, not sure if "body" is actually needed there, would need to experiment.

@Jaifroid
Copy link
Member

Can't use forEach there. Trying loop.

@Jaifroid
Copy link
Member

Supposedly can use forEach if we use querySelectorAll instead. However, loop will be faster, but will need to change ($this) references. Would you rather keep the jQuery here and see if the problem can be debugged another way?

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

We're using a slightly more complex selector for CSS line 946, for which it might be more convenient to keep jQuery.
Maybe I can try to find a solution while keeping jQuery, and you try without it?
NB : if you don't have time, I'll carry on the investigation by myself, no problem.

@Jaifroid
Copy link
Member

I've got a candidate with minimal changes to that function. Will post to a new branch.

Jaifroid added a commit that referenced this issue Apr 30, 2018
@Jaifroid
Copy link
Member

Can you check if that still has the issue in that routine? a25645d

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

It seems to work, but now fails with the same error in the same jQuery call of loadImagesJQuery (and probably in loadCSSJQuery).

@Jaifroid
Copy link
Member

OK. Do you want to follow similar pattern? Also, we need to clear the body to blank out the screen while loading a new document. I'll change that to DOM method. Then I'll stop and let you take over and will check back later today.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

Sure.
It might be worth testing this pattern to see if it really can solve the issue or not.
If you have to stop, I'll carry on the test on your branch if you're ok with that

@Jaifroid
Copy link
Member

Yes, please go ahead. I just pushed a commit clearing the doc body with DOM method instead of jQuery. Not fully tested...

@Jaifroid
Copy link
Member

Just FYI, Internet Explorer 11 doesn't support forEach. But go ahead and test with forEach, and we can change to standard loop later, and maybe use getElementsByTagName because it is said to be faster than querySelectorAll. forEach is a convenient way of testing whether the fix works in principle.

@Jaifroid
Copy link
Member

Alternatively, the issue may be just with the jQuery "find" routine, searching up through dead elements... Might be possible just to access the iframe in a different way, but still use jQuery "each", which handles the browser differences (albeit more slowly than standard loop).

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

Applying the same pattern on images and CSS selection seems to workaround the issue. That's good news.

jQuery behaves as if it kept some references on elements of the iframe.
In https://forum.imacros.net/viewtopic.php?t=25996, they seem to suggest that reloading jQuery would be a workaround. But that would probably be not optimal...
I need to investigate more to see if we have a reasonable way to keep jQuery here

@Jaifroid
Copy link
Member

Jaifroid commented Apr 30, 2018

I just pushed a commit da2ae9e that supports IE when using forEach. It's necessary to use Array.prototype.slice.call on the node list to fool IE into thinking it's a standard array. Doesn't seem to affect the other browsers at all.

EDIT: Reference is https://developer.mozilla.org/en-US/docs/Web/API/NodeList

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

I'm creating a testcase to reproduce the issue with jQuery and a minimal setup

@mossroy
Copy link
Contributor Author

mossroy commented Apr 30, 2018

I managed to reproduce in a simple testcase, and posted that as a question on stackoverflow : https://stackoverflow.com/questions/50101246/typeerror-cant-access-dead-object-with-jquery-and-an-iframe-populated-with-d
(I might need your help to explain why we ended up using the document.write() )

@Jaifroid
Copy link
Member

Jaifroid commented May 5, 2018

With the proposed fix for #366 , I think I've now isolated this problem to a version of NS_ERROR_UNEXPECTED which, according to various articles produced by a search for this error, is to do with elements being referenced in the DOM that no longer exist.

A candidate solution, apart from reducing the setting up of jQuery event listeners on the iframe, is for us to tell jQuery explicitly that we're unloading the iframe, so that it clears its cache. Adding this, just before we overwrite the iframe, seems to help on my initial tests:

// Tell jQuery we're removing the iframe document so that it clears its cache [kiwix-js #361]
$('#articleContent').empty();

Once we've pushed the changes for #366 to master, I'll rebase this branch so that we can test with the above. It's looking hopeful in testing on my JavaScript branch which was running into an NS_ERROR_UNEXPECTED exception in jQuery-3.2.1.slim when run in Firefox. I'm 90% sure this is the same issue.

EDIT: Further info on the need to remove jQuery elements with jQuery methods here: https://stackoverflow.com/questions/9304768/what-is-the-purpose-of-cache-in-jquery . Although this is a bit old now, and $.cache doesn't seem to exist in our current jQuery, some of the comments from 2016 seem pertinent, and suggest that using a jQuery method to remove the node, such as .empty(), will do some unlinking.

@mossroy
Copy link
Contributor Author

mossroy commented May 6, 2018

Interesting.
What is frustrating is that the .empty() call is probably mostly doing the same thing as we already do with the document.write() method : removing the content of the iframe.
And it looks like mixing jQuery and non-jQuery calls is what generates our references to dead objects.

Putting the iframe inside a div, and emptying the div (with jQuery), as you suggested above, might have a good chance to work : maybe it would be worth testing it.
I'm sorry to not have the time to do this test by myself for now (I'm on vacation).

@Jaifroid
Copy link
Member

Jaifroid commented May 6, 2018

Don't worry, I'll have a go with the div method of removing with jQuery. It would be good not to have to depend on document.write(). Enjoy your vacation!

@mossroy
Copy link
Contributor Author

mossroy commented May 6, 2018

You'll probably have to set the class articleIFrame on the iframe each time you re-create it, and also to manually call resizeIFrame()

@Jaifroid
Copy link
Member

Jaifroid commented May 7, 2018

OK, having done a lot of tests on different code, we have these issues with trying to inject the iframe either with .innerHTML or jQeury .html (which are basically the same thing):

  • For a dynamically created iframe or for an iframe with src="about:blank" (these are actually the same thing, i.e. if you dynamically create an iframe with no src, then it loads about:blank as its initial document anyway), Firefox will ignore any subsequent base href injected into the iframe via innerHTML. As a result, none of the links work. Firefox also sets the href.host of local anchors (i.e. to articles in ZIM file) as "html:". We can work around the latter (I wrote code to do it), but working around the fact that anchor hrefs don't include the base href would break too many things.

  • In Edge, the base tag of the iframe is used correctly and link hrefs have the full path appended, so everything works fine.

  • I cannot find any way to set the base href of a dynamically created iframe before it is loaded, and in any case, we are overwriting the full document once the dynamic frame has loaded with .innerHTML which has a new base href in it (ignored by Firefox), so I'm not sure it would help even if it were possible. EDIT: Remember we need the base href to be in the document as soon as the browser engine starts to render it, becasue it affects css and anything with a src. See Add the <base> tag *before* the document html is injected into the DOM #343 for full explanation.

Our choices are:

  • Revert to using dummyArticle as the src. This allows Firefox to correctly understand the iframe as being loaded with the file:// protocol (if that's the case) for example, and respect the base href when reporting links. EDIT: Although this solves the protocol issue, as we know (because it's how we used to do it), it may not solve the problem with the base href being read after css files and scripts are already on their way to being loaded (needs more testing).

  • Use document.write(), in which case my simple fix of emptying the iframe with jQuery before using document.write() on it is easier than wrapping the iframe in a div, and emptying the div and then using document.write() to recreate the iframe (since we can't create it with document.createElement due to the Firefox issue);

  • Stop setting baseHref, since Firefox ignores it, in which case we'd have to implement our own logic for determining the path using regex matching. I did implement such a system in Kiwix JS Windows before you wrote the baseHref code. It worked in jQuery mode, but I changed it to your method which seems more generic and fits better with Service Worker. EDIT: sample code of how to do it with regex matching is from this commit kiwix/kiwix-js-pwa@0ccd932 though I subsequently improved the code after, but this is the idea.

By the way, everywhere I looked I kept running into the assertion that the only cross-browser way to change the base href of an iframe document is to use document.write(). There are two other ways: 1) use an iframe src=data: URI or iframe src=javascirpt:return htmlArticle; -- but I believe both would be banned by the CSPs we need to respect; or 2) load the iframe document using the srcdoc attribute. Srcdoc is an HTML5 spec provided precisely to solve the problem of loading an iframe with dynamic code. I don't know if it's supported in FFOS, but unfortunately it is not yet supported in Edge. Support has been provided in dev versions of Edge, but not flighted yet. This also means it is not supported in UWP. I therefore haven't tested if it would work for us: maybe we can use it in the future.

Jaifroid added a commit that referenced this issue May 7, 2018
@Jaifroid
Copy link
Member

Jaifroid commented May 7, 2018

Commit e57af21 is provided to show how far I got with this experiment. As mentioned, it works in Edge, but despite my workarounds added to the link parsing, it fails in Firefox due to the fact that FF doesn't use the injected base href with this way of creating the iframe.

@mossroy
Copy link
Contributor Author

mossroy commented May 8, 2018

Thanks a lot @Jaifroid for all these tests and explanations.

Regarding the possibilities you listed :

  • Revert to using dummyArticle as the src. If the base tag is correctly handled, I vote for this solution. Because it remains simple, works the same on any browser, and lets us make a consistent use of jQuery, with no risk of memory leak or references to dead objects. In this case, it should be better to rename the file to something like blank.html. It might even be really blank (a file with a length of zero), to avoid any parsing from the browser (needs testing)
  • Use the fix for document.write. It's a probably-working solution, but not my favorite. What I dislike is the need to mix jQuery and non-jQuery ways to code : it's more complicated to understand/maintain, and it can create technical issues difficult to diagnose (like the references to dead objects we had)
  • Don't use the base tag. I doubt it is possible while keeping compatibility with both jQuery and ServiceWorker modes, on ZIM files that need it (like the ones from stackoverflow). I don't think it's worth investigating that
  • The srcdoc attribute. I did not know this one : thanks for pointing it. https://caniuse.com/#feat=iframe-srcdoc confirms the support issues you mention, on Microsoft browsers. There is a polyfill, though, for browsers that don't support this attribute : https://github.com/jugglinmike/srcdoc-polyfill. It seems to replace the srcdoc by the src=javascript: workaround you mentioned. It might work in UWP. As you said, it might not work on CSP-constrained contexts. But Firefox and Chrome already support this attribute for a long time, so it should not be an issue in these browser extensions. It might be a problem on FFOS... I'd say that, if the first 2 solutions fail, it might be tested a bit more. But it's the solution I like the least (excluding removing the base tag, that I consider as probably not-working).

@Jaifroid
Copy link
Member

Jaifroid commented May 9, 2018

Thanks for the observations, which I agree with. I'll check out the use of a blank document (equivalent to dummyArticle) and how FFOS behaves with regard to the base tag when injecting the iframe in code. The issue will be to be sure that all browsers respect the new injected base tag (as they do with document.write()) and don't just use the base tag of the dummy article. Will report back.

Jaifroid added a commit that referenced this issue May 9, 2018
Fixes #366 and possibly fixes #361 (subject to testing)
Jaifroid added a commit that referenced this issue May 9, 2018
@Jaifroid
Copy link
Member

Jaifroid commented May 9, 2018

I've moved discussion to #366 . However, I've rebased the fix-TypeError-tests branch on top of the Fix-history-back-regression branch so that we can test the difference made by doing several of the iterations in pure JavaScript as opposed to jQuery.

I've done a few timing tests with loading the Paris article in English Wikivoyage. Timings done using a console timer from the moment the article title is selected to the moment all images are sent to the decompressor (but not yet decompressed).

On FFOS simulator, the speed improvement of using pure JavaScript is 21%, from an average of 987ms (JQuery) to an average of 777ms (JavaScript). In Edge, the improvement is 40% from an average of 2339ms (jQuery) to an average of 1474ms (JavaScript). Note that Edge runs slowly with the console open, these values are not typical of performance with the console closed (but with it closed, the timings are not reported).

So there is an improvement, but it is an improvement of 200ms in FFOS simulator, which most users won't notice. Nevertheless, we could eke out some performance gains this way. I'm not strongly advocating for doing so, but in case you're interested, here is a discussion on why jQuery may no longer necessary with HTML5: https://blog.garstasio.com/you-dont-need-jquery/why-not/ .

@mossroy
Copy link
Contributor Author

mossroy commented May 9, 2018

Very interesting article.
What I feel important is the consistency of our code : we should almost always use the same method to do similar things. For the readability/maintenance of this code, and to avoid bugs like this one. Except (if really necessary) a few exceptions that are well documented.
I recognize myself in your article on the reasons I chose to use jQuery at the beginning of this project. And it might be true that these reasons are not relevant any more.
I suggest that we create a separate issue to try to remove jQuery. If it works well on our target browser engines, does not make the code much more complicated, and gives some performance improvements, I'd be happy to remove it in a future version.
Until that, I'd prefer to keep code consistency.

Jaifroid added a commit that referenced this issue May 9, 2018
Jaifroid added a commit that referenced this issue May 10, 2018
Jaifroid added a commit that referenced this issue May 10, 2018
Jaifroid added a commit that referenced this issue May 21, 2018
Also reverts to innerHTML method of injecting the iframe due to incompatibilities of document.write() with Service Worker mode in some browsers. Fixes #361 , #366 , possibly #365.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants