ZIM backend : Cache the directory entry list #130

mossroy · 2016-01-04T18:13:19Z

To improve performance

sharun-s · 2017-06-12T02:47:57Z

If the cache is maintained only during asset loading and then thrown away there is some nice article loading speed up to be gained.

I noticed this while playing around with replacing util.readFileSlice with XHR range requests. For example when I load the wikipedia article for Paris I was seeing ~14000 requests in the network tab of devtools. Just total read time was 30-35s. This didn't make any sense as there were only 281 images to be loaded. So on avg there were ~50 read requests per image. And most of the images are really small (flags, icons etc). So where were all the reads coming from?

Binary search.

Since the zim article count is 1.7 million binary search requires on avg log2(1.7 million) steps or around 20 calls to getDirEntfromtitle (which itself is a 2 step process of reading index followed by reading dirent). So what happens if a temporary cache ( a Map of url\index -> dirent) is maintained just for the image loading process 14000 requests fell to 4000. And time fell from 30-35s fell to 9-11s.

I haven't confirmed this, but it is probably because most of the assets of a given page are all stored close together in the zim, so a whole lot of binary search steps(if cached) don't get repeated. The cache size for a large page like Paris was about 3 MB.

I made a note of this here instead of opening another issue, so any dirent caching related comments are in one place. There are further ways to optimize this. Will leave notes in #240 as these two issues are interconnected.

mossroy · 2017-06-12T19:09:11Z

Very interesting investigation.
It would be great to improve performance, which is very bad for now (compared to the native libzim implementation).

@peter-x had started working on some cache implementation, and submitted the PR #183
Unfortunately, it did not speed up as much as we expected, so it was not merged (caching can be efficient but makes the code more complicated to understand/maintain/debug).

Also keep in mind that if #116 finally works, it would replace this kind of optimization (at least for wasm compatible browsers).

sharun-s · 2017-06-13T07:32:53Z

The code here is pretty simple because it's just being used during one page load (further just restricted to image loads - which cause the biggest number of requests). If cache had to be maintained across page loads, dirent hit rates would drop drastically and code would be more complex (to keep cache size in check with the right cache replacement strategy).

Basically now it's just retrieve from cache and add to cache in zimArchive.getDirEntryByTitle and zimFile.dirEntryByUrlIndex Because the cache is quite small even for large pages it can be maintained just for the load duration and discarded.

Currently the branch is a mess because of other unrelated experiments, but will clean it up and send a PR when I get some time or if the other stuff add to the gains. I have a feeling the current 9 seconds (with cache) spent in lookup, can further drop to around 1-2s. A lot of the code is async, but not concurrent. This is the crux of the performance issue.

That is all the tasks getting produced are being queued up (randomly) one behind the other on the main thread. And then there is a long wait for everything to return. Much of this can be parallelized with webworkers, xhr requests etc so some of those tasks are being done on separate threads.

Whether 116 works out or not I find this a nice little problem that is helping me learn some javascript :)

mossroy · 2017-06-13T07:55:12Z

Cool : that looks very promising!

sharun-s · 2017-06-16T00:37:13Z

@mossroy I believe, I have some good news for you :-)
Please take a look at a new branch I have created called workers when you get time. And use a big zimfile like wikipedia to test it.

This branch builds on the binary search cache I posted about earlier. Basic idea is all the urlindex/dirent lookups generated by binary search (during the image loading process of a page) are sent off to a web worker. This is easy to do as we have the list of images whose dirents we want to find.
Every time an image's dirent is found a message is sent to the main thread to initiate a file read.

So what we get is dirent lookups (which are in the thousands thanks to binary search and large zim size) happen in the worker (taking time X). While blob reads (which are in the hundreds) happen on the main thread (taking time Y). So total image load time before used to be X+Y and now with two threads total time is which ever is greater. Add the binarysearch cache for image loads and value of X shrinks quite a bit.

The Paris image load time has dropped for both lookup+blobreads from ~45s to 10s. And I have tested with other pages too and it's looking good :) Firefox to my surprise is showing even better performance than Chrome. Also works on Edge!

There is lots more to do and try, including a whole lot of cleanup, so good point to get feedback before adding all kinds of other things.

mossroy · 2017-06-17T16:26:43Z

Thanks a lot for that! I'll try to have a look in the weekend

mossroy · 2017-06-17T22:25:37Z

That's really cool! An amazing work on performance optimization.
The speedup is very clear on big articles : the whole page is ready around twice faster.
That proves there is room for serious performance gains in pure javascript.

As you said, the code needs a lot of cleanup (for example remove the readXHRSlice stuff, the commented lines, avoid code duplication inside the webworker, more meaningful variable names etc). As it is, I find it very complicated to get into, which is normal because, as you said, they are also unrelated experiments. There are also some other refactoring you made on the code (in app.js), that should be proposed (in the end) in separate PRs as they're not related to optimization.

The main concern is to find a balance between speed gain and code complexity.
I agree with you that the cache on dirEntries itself is quite simple, but the whole thing is not : several concurrent instances of a webworker, the code of the webworker itself is not trivial, and all this only applies to images, where other content is still read as before.
Inside the webworker, why do you loop N times (with N set to 2) on startChain, with and id parameter that is not used? I suppose it's a previous optim experiment that is remaining?

I'm wondering if the webworker(s) could do all the work on ZIM files : as soon as we need to read something from the ZIM file, it would be done in a separate thread (through the webworker). It would leave the main thread for the UI only, and avoid code duplication. It's just an idea, I did not look into it.

This optim also broke a few things for now, which is not surprising, and can certainly be fixed (but it's not the priority) :

Unit tests are now failing
The CSS styles are not applied any more on Firefox (but are on Chrome)
When clicking on a link to a new page, it does not scroll to the top of the page any more
It does not work any more on Firefox OS (which supports WebWorkers : it was working on Evopedia, which was using a webworker too)

Globally, I think your research gives very good hints on what can be optimized. We need to find the right way to do it efficiently while keeping the code simple.

sharun-s · 2017-06-18T09:35:10Z

:) Yup there are lot of issues and lot possibilities. Which is why I posted and didn't submit a PR. There is lot's to discuss before that happens.

I think the right way to do it efficiently involves figuring out Issue 240.
Lot of things will fall into place once that process is set.

All this is possible and possible to talk about I feel only when code flow timing clearly reveals itself for any change proposed.

Otherwise non-obvious to know which changes take us 5 steps forward and which take us 10 steps back. Once some insight is gained, it is easy to try all kinds of things. And as changes pile up on top of each other, things get more complicated very fast. I have a long list of things worth trying next, but there is no use trying them until we have a common established way to talk about perf.

Time is not straightforward to measure because of the amount of async tasks (Promises), spawning async tasks, spawning async tasks ad infinitum. While it is possible to measure when the task started and ended, many of the spawned children, grandchildren and so on could be still being processed or on the queue and so endtime-starttime hardly ever captures what is going on. This is why you will see random returns, Promise.resolves and Promise.all, id parameter etc in my code. It's was my goal to capture when each promise is resolved and how long they are taking in Total. But not being a js dev I don't know what the best practices are. This timing info gave me the first clues that binary search was the culprit.
It further gave ideas of how to parallelize the code by putting the "right" bit into the worker.

I need help figuring this measurement stuff out. Whether this is in code or a documented process, it will make sure anyone who touches the code in future, is measuring things the same way. This is why #240 is important. The sooner we lock down how to do that the foundations are set, for all future changes to be performance sensitive.

Now to address your other comments:
on looping N times -
I started explaining this but this became quite a long story (a back and forth chat session or something would be more appropriate) We can do that, after you get through these two links - run n promises in parallel and the js event loop

They also helped me understand how exactly to time the $(articleContent img).each(...) main loop which produced the clues on what needed to be done.

I know it's breaking a lot of things and am keeping track of them, but I am treating this as an experiment rather than code ready to be checked in. As you say this has to be planned out as many different things need to be worked out. Let me know when you are ready for a chat (you can just mail me). It's hard to type out everything here as there are too many things to talk about.

mossroy · 2017-06-18T21:48:55Z

I agree with your approach, and the way to do it.
It's too early to create PRs. It's probably better to try a few different optimizations (even if the code is not clean, even if it breaks things), measure the gain, and then decide what to do.
I think something like nightwatch might help us, as it's able to wait for an element to be visible. We might ask it to wait for all the images to be visible on a specific big page, and measure how much time it takes to have all of them.
It would also avoid to create timers that would depend on how the code is implemented.
The existing nightwatch_runner.js file (run on each build by Travis/Sauce) might be a start.

Jaifroid · 2017-06-22T07:08:57Z

Excellent work, @sharun-s ! I've tried it out on Windows Mobile, and the code works very well -- the image load time is greatly reduced, though CSS is no longer loading (as noted by @mossroy above). My Bolivia "test" page (very heavy page inside a 15GB ZIM file) displays images almost instantaneously on the phone!

Note however that the buffer overrun bug is still present in this branch, and it blocks loading of almost any page in a large file on WM10. Could you possibly fix it on your repo to facilitate my testing of further changes on WM10? Easiest would be for you simply to edit xzdec.js: search for "8e7" and change it to "0x8e7" so that bit of the minified file reads: var TOTAL_MEMORY=Module["TOTAL_MEMORY"]||0x8e7;. That might even fix some of the issues with the other browsers.

Thanks for your great work on this -- even in this preliminary form, it would be a great improvement for a beta release of the UWP app (obviously CSS needs fixing).

sharun-s · 2017-06-22T09:43:06Z

:) thanks @Jaifroid but I wouldn't use this code right now. It will in all likelihood lead to newer more exotic issues. Worker lifecycle/termination is not at all being handled, so no idea right now what issues that might produce. I will get some time this weekend to work on it, and will give you an update on when a more tested/cleaned up/stabler version will be available. This is going to be a positive change so I'd rather get it right than rush through with it. That said please report any other issues you see if you are playing with that branch. Have made a note to update the branch with your xzdec fix and other newer changes.

Jaifroid · 2017-06-22T10:04:18Z

Thanks, @sharun-s . Don't worry, we're not ready to release a Beta yet, there are a few things that need cleaning up and debugging elsewhere first, including a problem with the app force closing during file picking on random occasions, which may be to do with the require.js method of chaining libraries, but I have to debug it. I won't use your code yet, but I will test it in-app, as opposed to in-browser (it works in-browser on Windows 10 Mobile, haven't yet tested in the app environment, though it's the same rendering engine).

sharun-s · 2017-06-29T01:01:57Z

Just an fyi - have updated my dev branch. Fingers crossed with this change you should see some nice load times.

Before if there were N images to load I was just splitting the lookups between w workers. So an N/w array of tasks would be sent to each worker. This change has added another worker that just looks up images "above the fold". Theory being, the user is only going to be seeing less than 2-3 images within the initial visible portion of the article. The goal is to take every other piece of work of the browsers plate until these images + the first css file get injected. So I WAIT for that to complete before initiating any further image read/injecting. I have currently set the "above the fold" number to 1 to minimize time spent as much as possible. But it can be higher for similar results and we can decide what it should be after more testing.

I have also reverted to using the iframe instead of the div. So css should load and scroll to top should work.

Code has to be cleaned up but unless you guys spot some issues I think it's ready to be merged.

Jaifroid · 2017-06-29T07:20:44Z

OK, that's great, thanks @sharun-s . I'll test asap on my platforms. Can I just check, are you still using the existing jQuery code for injecting the CSS?

sharun-s · 2017-06-29T09:08:50Z

@Jaifroid haven't change anything much in the css portion, other than to add a promise (cssLoaded), to track when the css file has completed it's loading. You can lookup where the cssLoaded is used to see how it controls when the workers get created.

Jaifroid · 2017-06-30T07:15:22Z

@sharun-s , I've tested on Edge, on Edge Mobile, and as a UWP app for PC and for mobile, and it's functioning very well -- congratulations! The images load extremely fast, even on mobile with a very heavy page from a 19Gb file. The CSS is still very slow to load on mobile, but a fair bit faster on PC.

FYI, the blob approach works with CSS, but it is difficult to ensure the code is inserted into the HTML before the HTML is injected into the iframe, due to the (necessary) asynchronous design. Your promise approach with the CSS might help, but there'd necessarily be a bottleneck if working on the HTML string BEFORE it is injected. This will only be useful if it dramatically speeds up extraction of CSS.

I really don't understand why extracting a little bit of ASCII from large multi-gigabyte ZIM file takes so much longer than extracting a dozen binary images from the same file, even before your speedups with the images. I can only think this is because the CSS is stored at the end of the ZIM file. If we know roughly where it is stored, maybe we can target the search. I don't understand the search algos well enough, but it seems you do.

Jaifroid · 2017-06-30T08:07:23Z

@sharun-s , @mossroy , I'm away in Paris this weekend from later today, back Tuesday, so I have to pause on the dev work till then. Have a nice weekend!

sharun-s · 2017-06-30T11:49:37Z

I have mostly been testing on Chrome. But I did a brief Firefox and Edge (on desktop) run and found things mysteriously much faster on Firefox and much slower on Edge. Have to look into what they are doing differently. Might throw up things that can be applied across the board.

My feeling is CSS loading issues might not be a retrieval (i.e. read/lookup) bottleneck. Timing wise, have seen them take about the same time as image lookup+reads. More likely an injection/render issue. Need to test more though, to verify things. My focus has been more on the image side of things so far. And I think that CSS delay is more pronounced due to the double draw/reflow issue we discussed If the page suddenly changes style after a second or more the delay is very noticeable to the user.

Good to hear the blob approach works. Will try it out over the weekend.

When you do get a chance next, please save and send me the timing info printed out on the console. "TimeToFirstPaint" holds how much time it's taking for that first image and css load. Just to get an idea of the performance differences platform to platform. Anyways no big rush. Enjoy your trip!

Jaifroid · 2018-01-05T13:09:58Z

@mossroy Having ported @sharun-s 's DirEntry cache code to Kiwix JS Windows some time ago (it's been in since the first release of the app in the MS Store), I could very easily port it into Kiwix JS. From memory, it's virtually a couple of lines and it dramatically increases the speed of subsequent loads of any page (it doesn't affect the speed of the first page loaded), due to caching of the entries for stylesheets, and caching of the blobs (there are two possible cache optimizations, one of which I found more useful than the other). I ported this to Kiwix JS Windows in order to increase performance if a user disables the loading of cached assets.

No rush of course, as this would be for v2.3, but thought I'd offer to do this, having had the experience of integrating it already.

mossroy · 2018-01-14T10:08:00Z

Thanks a lot for the offer @Jaifroid . It would indeed be worth porting this on kiwix-js, as the code is not very complicated.
But I think of a few things to be careful at :

There should be a limit of how much data we put in the cache. Else it might crash the browser engine on some devices that do not have a lot of memory
If there are 2 cache implementations and one is clearly more efficient than the other one, I would prefer to implement only this one for now. And take the time afterwards it it's worth adding the second one or not

mossroy · 2018-01-14T10:17:24Z

I should also add that there had been an attempt in #183 to do similar things, (but the speed improvement was not worth it)

Of the two, dirEntryCache and cssBlobCache, it is the latter that makes the most dramatic difference, because the css is very slow to load, and once cached (after first page load, sometimes after second or third for less-used CSS) it then loads instantly from the cache.

Jaifroid · 2018-01-14T14:51:16Z

Commit 871836c implements the cssBlobCache. Of the two, dirEntryCache and cssBlobCache, it is the latter that makes the most dramatic difference, because the css is very slow to load, and once cached (after first page load, sometimes after second or third for less-used CSS) it then loads instantly from the blob cache. Thanks to @sharun-s for the underlying code.

Tested and working in the FFOS simulator. Note that I've left console log entries in so that you can see what it's doing (when the cache is hit).

VERY IMPORTANT: in this prototype code (for testing) I'm only clearing the cache on loading a ZIM with the File Picker here . However, it will be very very important to identify where Kiwix JS running on FFOS / Ubuntu phone / other packaged apps loads a new ZIM (without using the FIle Picker) and add in a copy of the cache clearing code at that point too.

As you know the ins and outs of the FFOS app, can you take a look at where else this code should be added, @mossroy ? The cache doesn't persist between sessions afaik (well, that might need testing on FFOS -- it doesn't persist between browser sessions). But it must be cleared on loading a new ZIM.

Jaifroid · 2018-01-14T14:52:08Z

PS the branch for testing is: https://github.com/kiwix/kiwix-js/tree/Backport-CSSCache

Jaifroid · 2018-01-14T14:55:20Z

PPS As the comment in the code suggests, it may be sufficient to insert the cache clearing code in setLocalArchiveFromArchiveList , but I've fiddled too much with that part of the code in Kiwix JS Windows to be confident of putting it in the right place for FFOS.

Jaifroid · 2018-01-14T15:04:47Z

Hmm, I spoke too soon about it working in FFOS. It appears to work, but it's having trouble loading CSS refs that look like this:

<link href="blob:app://kiwix.org/2fe29d70-2478-4ed9-8e64-0308d9413c7c" rel="stylesheet" type="text/css">

I'm getting this error in the console:

Content Security Policy: The page's settings blocked the loading of a resource at self ("script-src app://kiwix.org").

No such error running this code in Firefox. Any thoughts on what the problem is in FFOS, @mossroy ?

Jaifroid · 2018-01-14T21:42:02Z

OK, I think FFOS is seeing blob: URLs as remote URLs. According to https://developer.mozilla.org/en-US/docs/Archive/B2G_OS/Firefox_OS_apps/Building_apps_for_Firefox_OS/CSP , on FFOS:

Remote styles are banned
All tags must link to files in your app's package. Inline style tags and attributes (<style> and style="") are allowed, however, for privileged apps.

So it seems we'd have to cache the raw data rather than a URL to support FFOS. Pity, because it complicates the code a lot... Does this seem like a reasonable guess as to what is going on @mossroy ? Because the blob URLs are exactly the same as for images, and those definitely work on FFOS. Basically, this works:

<img width="250" height="254" src="blob:app://kiwix.org/c0665b26-ed81-4e9d-8cb4-2d7985a34ae9">

and this doesn't (only on FFOS):

<link href="blob:app://kiwix.org/9c284e45-5f22-4492-a023-12862e3f9b92" rel="stylesheet" type="text/css">

Grrr.

Jaifroid · 2018-01-14T22:26:53Z

Turned out to be a lot easier to cache the raw CSS than I thought it would be. See ca88b37 . This now works on the FFOS simulator and the speedup is quite remarkable for subsequent pages (after first).

Remember this code is still missing the lines needed to clear the cache on load of a new ZIM in FFOS (it clears the cache on load of a new ZIM with the file picker). Maybe you could insert that at the appropriate place as part of your testing, @mossroy . I'd rather test against the branch and be sure everything is OK technically before doing a PR.

mossroy · 2018-01-15T12:36:53Z

Thanks a lot for that branch @Jaifroid .
I suggest that we discuss this code in another issue. This one is bloated by a lot of discussions about other performance improvements.
I'll create a new issue about caching the CSS contents.

Jaifroid · 2022-11-06T13:40:23Z

This issue was addressed a long time ago. Closing.

mossroy added the enhancement label Jan 4, 2016

mossroy assigned peter-x Jan 4, 2016

mossroy added this to the v2.1 milestone Jan 4, 2016

mossroy mentioned this issue May 3, 2017

Feedback from exploratory testing on Chrome OS #220

Closed

sharun-s mentioned this issue Jun 22, 2017

Race conditions in title search, leading sometimes to wrong results, or results displayed on top of article #184

Closed

mossroy modified the milestones: v2.2, v2.3 Jan 4, 2018

mossroy assigned Jaifroid and unassigned peter-x Jan 14, 2018

Jaifroid added a commit that referenced this issue Jan 14, 2018

Caching raw CSS rather than a BLOB #130

ca88b37

mossroy mentioned this issue Jan 15, 2018

Cache the CSS contents (in jQuery mode) #335

Closed

mossroy modified the milestones: v2.4, v2.5 May 2, 2018

Jaifroid added the backend label Jul 22, 2021

Jaifroid closed this as completed Nov 6, 2022

Jaifroid modified the milestones: v3.8, v3.7 Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZIM backend : Cache the directory entry list #130

ZIM backend : Cache the directory entry list #130

mossroy commented Jan 4, 2016

sharun-s commented Jun 12, 2017

mossroy commented Jun 12, 2017

sharun-s commented Jun 13, 2017

mossroy commented Jun 13, 2017

sharun-s commented Jun 16, 2017

mossroy commented Jun 17, 2017

mossroy commented Jun 17, 2017

sharun-s commented Jun 18, 2017 •

edited

Loading

mossroy commented Jun 18, 2017

Jaifroid commented Jun 22, 2017 •

edited

Loading

sharun-s commented Jun 22, 2017 •

edited

Loading

Jaifroid commented Jun 22, 2017

sharun-s commented Jun 29, 2017

Jaifroid commented Jun 29, 2017

sharun-s commented Jun 29, 2017

Jaifroid commented Jun 30, 2017

Jaifroid commented Jun 30, 2017

sharun-s commented Jun 30, 2017 •

edited

Loading

Jaifroid commented Jan 5, 2018 •

edited

Loading

mossroy commented Jan 14, 2018

mossroy commented Jan 14, 2018

Jaifroid commented Jan 14, 2018 •

edited

Loading

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

mossroy commented Jan 15, 2018

Jaifroid commented Nov 6, 2022

ZIM backend : Cache the directory entry list #130

ZIM backend : Cache the directory entry list #130

Comments

mossroy commented Jan 4, 2016

sharun-s commented Jun 12, 2017

mossroy commented Jun 12, 2017

sharun-s commented Jun 13, 2017

mossroy commented Jun 13, 2017

sharun-s commented Jun 16, 2017

mossroy commented Jun 17, 2017

mossroy commented Jun 17, 2017

sharun-s commented Jun 18, 2017 • edited Loading

mossroy commented Jun 18, 2017

Jaifroid commented Jun 22, 2017 • edited Loading

sharun-s commented Jun 22, 2017 • edited Loading

Jaifroid commented Jun 22, 2017

sharun-s commented Jun 29, 2017

Jaifroid commented Jun 29, 2017

sharun-s commented Jun 29, 2017

Jaifroid commented Jun 30, 2017

Jaifroid commented Jun 30, 2017

sharun-s commented Jun 30, 2017 • edited Loading

Jaifroid commented Jan 5, 2018 • edited Loading

mossroy commented Jan 14, 2018

mossroy commented Jan 14, 2018

Jaifroid commented Jan 14, 2018 • edited Loading

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

Jaifroid commented Jan 14, 2018

mossroy commented Jan 15, 2018

Jaifroid commented Nov 6, 2022

sharun-s commented Jun 18, 2017 •

edited

Loading

Jaifroid commented Jun 22, 2017 •

edited

Loading

sharun-s commented Jun 22, 2017 •

edited

Loading

sharun-s commented Jun 30, 2017 •

edited

Loading

Jaifroid commented Jan 5, 2018 •

edited

Loading

Jaifroid commented Jan 14, 2018 •

edited

Loading