-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a queue for extraction of assets #412
Comments
In ServiceWorker mode, the browser seems to prioritize the requests as we would expect : first the HTML, then all the CSS, then the images that are immediately visible, then the other images. I tested on both Firefox and Chromium. Anyway, for some reason, in ServiceWorker mode, the page does not render before all the resources are fetched. I was hoping it would render earlier, as it does when browsing online. But that's a separate issue I will create, that will certainly be handled differently in ServiceWorker mode. To sum up, I think you should focus on jQuery mode in this issue, because I don't think it's possible to handle both modes with the same code. |
I have been testing the extraction order in Service Worker (SW), using Firefox Quantum (as part of my attempts to solve the issues experienced with #417 ). I have uploaded a branch that counts the CSS in SW mode (and the console logging currently going on in SW). The branch is not changing the order in which assets are sent to SW. Below is the result of loading an average article ("Helston" in English Wikipedia) -- screenshot of the FF console. As you can see, the last CSS is the penultimate thing to be returned, after nearly all the images. As you rightly say above, Firefox and Chrome will not render the page until this last CSS is received, but that is too late for us. On a small page like this, it doesn't matter, but we could greatly improve page display speed on files with hundreds of images (and especially SVGs) if we prioritize extraction of CSS. NB The reason CSS is left to last is that extraction of images does not engage xzdec. They are stored as-is, with no further compression (as they are already compressed). CSS requires costly CPU cycles to decompress, so it gets put on the back burner by the async engine. This is the opposite of what we want! EDIT: You can observe quite a dramatic version of the problem if you run my branch, with console open, on "Trigonometric Functions" from the latest Wikipedia Mathematics ZIM. Most of the CSS is extracted after most of the SVGs. |
Based on the above experiments, I've now had a go at producing a single, universal queue for jQuery and Service Worker modes: https://github.com/kiwix/kiwix-js/tree/Universal-queue Currently the queue only prioritizes extraction of CSS, and then it lets everything else get extracted in whatever order it returns from the decompressor. But it is easily extensible to extract, say, binary images before JavaScript, and JavaScript before SVGs. This can be a TODO -- I wanted a proof of principle. I'm still doing a bit of tweaking and performance testing. |
Interesting.
|
I'm not sure I follow. Surely a new html request would have to be on a separate browser thread. Otherwise there would be terrible confusion all round if we have, e.g. CSS files with the same name being requested and dumped into the wrong DOM. I don't know how prefetching works low-down, but it would most likely be at the level of populating the browser cache.
Just to be clear, there is no modification of the DOM whatsoever. All I am doing here is delaying the reading of non-CSS or HTML files. We have to extract the files from the ZIM, so intervening in the extraction process is a viable "universal" solution that does not require DOM manipulation. I understand the objection to adding code in the message channel. Would it be better to do the queuing at a lower level, e.g. at readBinaryFile and readUTFFile level? |
Maybe it would be more clear if you think about the ServiceWorker as if it was an HTTP proxy server (except it's dedicated to one window/tab of the browser). It's a usual comparison.
I was referring to the following lines of code inside handleMessageChannelMessage function :
I understand the purpose, but we should not do that here. We should keep an architecture like :
No, it's the same issue : the ZIM backend should be completely independent from the browser, at least because we'll try to replace it by a wasm version of libzim. |
Thank you for the detailed explanation! Very helpful. I was thinking of my code as something along the lines of server load balancing or QoS, freeing up CPU time to deal with the time-sensitive requests. But if it's more like a proxy, then I see the potential for race conditions. |
I think this is no longer relevant. |
As explained in #219 (here and here), current master delays rendering the page until all the CSS is extracted from the ZIM, but CSS is heavily compressed and is often therefore extracted after all other assets on the page have been extracted. This can lead to long waits for the page to render if a page has many images on it. This is especially the case when there are lots of SVGs (almost any page on a mathematics ZIM). This problem affects jQuery and Service Worker modes. It is ameliorated by caching, but not eliminated. We therefore need a queueing mechanism to control the order of extraction of various assets.
Another argument in favour of queuing would be to allow us to prioritize subcategories of assets, for a better user experience. For example, we could ensure the following extraction order simply by queuing functions to do the following things:
I propose one possibility for an extensible queuing framework in https://github.com/kiwix/kiwix-js/tree/Add-simple-jQuery-queue . Currently it only applies to jQuery mode. We could in principle hook into Service Worker requests as they come into app.js and queue responses to create a similar effect in Service Worker mode. It may well be possible to implement a "universal" queuing mechanism in a lower-level process such as zimDirEntry.js.
Before finalizing any framework, and before creating a PR, we need to decide whether the framework should include ServiceWorker mode or only jQuery mode. Your thoughts, @mossroy ?
The text was updated successfully, but these errors were encountered: