-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIM backend : Cache the directory entry list #130
Comments
If the cache is maintained only during asset loading and then thrown away there is some nice article loading speed up to be gained. I noticed this while playing around with replacing util.readFileSlice with XHR range requests. For example when I load the wikipedia article for Paris I was seeing ~14000 requests in the network tab of devtools. Just total read time was 30-35s. This didn't make any sense as there were only 281 images to be loaded. So on avg there were ~50 read requests per image. And most of the images are really small (flags, icons etc). So where were all the reads coming from? Binary search. Since the zim article count is 1.7 million binary search requires on avg log2(1.7 million) steps or around 20 calls to getDirEntfromtitle (which itself is a 2 step process of reading index followed by reading dirent). So what happens if a temporary cache ( a Map of url\index -> dirent) is maintained just for the image loading process 14000 requests fell to 4000. And time fell from 30-35s fell to 9-11s. I haven't confirmed this, but it is probably because most of the assets of a given page are all stored close together in the zim, so a whole lot of binary search steps(if cached) don't get repeated. The cache size for a large page like Paris was about 3 MB. I made a note of this here instead of opening another issue, so any dirent caching related comments are in one place. There are further ways to optimize this. Will leave notes in #240 as these two issues are interconnected. |
Very interesting investigation. @peter-x had started working on some cache implementation, and submitted the PR #183 Also keep in mind that if #116 finally works, it would replace this kind of optimization (at least for wasm compatible browsers). |
The code here is pretty simple because it's just being used during one page load (further just restricted to image loads - which cause the biggest number of requests). If cache had to be maintained across page loads, dirent hit rates would drop drastically and code would be more complex (to keep cache size in check with the right cache replacement strategy). Basically now it's just retrieve from cache and add to cache in zimArchive.getDirEntryByTitle and zimFile.dirEntryByUrlIndex Because the cache is quite small even for large pages it can be maintained just for the load duration and discarded. Currently the branch is a mess because of other unrelated experiments, but will clean it up and send a PR when I get some time or if the other stuff add to the gains. I have a feeling the current 9 seconds (with cache) spent in lookup, can further drop to around 1-2s. A lot of the code is async, but not concurrent. This is the crux of the performance issue. That is all the tasks getting produced are being queued up (randomly) one behind the other on the main thread. And then there is a long wait for everything to return. Much of this can be parallelized with webworkers, xhr requests etc so some of those tasks are being done on separate threads. Whether 116 works out or not I find this a nice little problem that is helping me learn some javascript :) |
Cool : that looks very promising! |
@mossroy I believe, I have some good news for you :-) This branch builds on the binary search cache I posted about earlier. Basic idea is all the urlindex/dirent lookups generated by binary search (during the image loading process of a page) are sent off to a web worker. This is easy to do as we have the list of images whose dirents we want to find. So what we get is dirent lookups (which are in the thousands thanks to binary search and large zim size) happen in the worker (taking time X). While blob reads (which are in the hundreds) happen on the main thread (taking time Y). So total image load time before used to be X+Y and now with two threads total time is which ever is greater. Add the binarysearch cache for image loads and value of X shrinks quite a bit. The Paris image load time has dropped for both lookup+blobreads from ~45s to 10s. And I have tested with other pages too and it's looking good :) Firefox to my surprise is showing even better performance than Chrome. Also works on Edge! There is lots more to do and try, including a whole lot of cleanup, so good point to get feedback before adding all kinds of other things. |
Thanks a lot for that! I'll try to have a look in the weekend |
That's really cool! An amazing work on performance optimization. As you said, the code needs a lot of cleanup (for example remove the readXHRSlice stuff, the commented lines, avoid code duplication inside the webworker, more meaningful variable names etc). As it is, I find it very complicated to get into, which is normal because, as you said, they are also unrelated experiments. There are also some other refactoring you made on the code (in app.js), that should be proposed (in the end) in separate PRs as they're not related to optimization. The main concern is to find a balance between speed gain and code complexity. I'm wondering if the webworker(s) could do all the work on ZIM files : as soon as we need to read something from the ZIM file, it would be done in a separate thread (through the webworker). It would leave the main thread for the UI only, and avoid code duplication. It's just an idea, I did not look into it. This optim also broke a few things for now, which is not surprising, and can certainly be fixed (but it's not the priority) :
Globally, I think your research gives very good hints on what can be optimized. We need to find the right way to do it efficiently while keeping the code simple. |
:) Yup there are lot of issues and lot possibilities. Which is why I posted and didn't submit a PR. There is lot's to discuss before that happens. I think the right way to do it efficiently involves figuring out Issue 240. All this is possible and possible to talk about I feel only when code flow timing clearly reveals itself for any change proposed. Otherwise non-obvious to know which changes take us 5 steps forward and which take us 10 steps back. Once some insight is gained, it is easy to try all kinds of things. And as changes pile up on top of each other, things get more complicated very fast. I have a long list of things worth trying next, but there is no use trying them until we have a common established way to talk about perf. Time is not straightforward to measure because of the amount of async tasks (Promises), spawning async tasks, spawning async tasks ad infinitum. While it is possible to measure when the task started and ended, many of the spawned children, grandchildren and so on could be still being processed or on the queue and so endtime-starttime hardly ever captures what is going on. This is why you will see random returns, Promise.resolves and Promise.all, id parameter etc in my code. It's was my goal to capture when each promise is resolved and how long they are taking in Total. But not being a js dev I don't know what the best practices are. This timing info gave me the first clues that binary search was the culprit. I need help figuring this measurement stuff out. Whether this is in code or a documented process, it will make sure anyone who touches the code in future, is measuring things the same way. This is why #240 is important. The sooner we lock down how to do that the foundations are set, for all future changes to be performance sensitive. Now to address your other comments: They also helped me understand how exactly to time the $(articleContent img).each(...) main loop which produced the clues on what needed to be done. I know it's breaking a lot of things and am keeping track of them, but I am treating this as an experiment rather than code ready to be checked in. As you say this has to be planned out as many different things need to be worked out. Let me know when you are ready for a chat (you can just mail me). It's hard to type out everything here as there are too many things to talk about. |
I agree with your approach, and the way to do it. |
Excellent work, @sharun-s ! I've tried it out on Windows Mobile, and the code works very well -- the image load time is greatly reduced, though CSS is no longer loading (as noted by @mossroy above). My Bolivia "test" page (very heavy page inside a 15GB ZIM file) displays images almost instantaneously on the phone! Note however that the buffer overrun bug is still present in this branch, and it blocks loading of almost any page in a large file on WM10. Could you possibly fix it on your repo to facilitate my testing of further changes on WM10? Easiest would be for you simply to edit xzdec.js: search for "8e7" and change it to "0x8e7" so that bit of the minified file reads: Thanks for your great work on this -- even in this preliminary form, it would be a great improvement for a beta release of the UWP app (obviously CSS needs fixing). |
:) thanks @Jaifroid but I wouldn't use this code right now. It will in all likelihood lead to newer more exotic issues. Worker lifecycle/termination is not at all being handled, so no idea right now what issues that might produce. I will get some time this weekend to work on it, and will give you an update on when a more tested/cleaned up/stabler version will be available. This is going to be a positive change so I'd rather get it right than rush through with it. That said please report any other issues you see if you are playing with that branch. Have made a note to update the branch with your xzdec fix and other newer changes. |
Thanks, @sharun-s . Don't worry, we're not ready to release a Beta yet, there are a few things that need cleaning up and debugging elsewhere first, including a problem with the app force closing during file picking on random occasions, which may be to do with the require.js method of chaining libraries, but I have to debug it. I won't use your code yet, but I will test it in-app, as opposed to in-browser (it works in-browser on Windows 10 Mobile, haven't yet tested in the app environment, though it's the same rendering engine). |
Just an fyi - have updated my dev branch. Fingers crossed with this change you should see some nice load times. Before if there were N images to load I was just splitting the lookups between w workers. So an N/w array of tasks would be sent to each worker. This change has added another worker that just looks up images "above the fold". Theory being, the user is only going to be seeing less than 2-3 images within the initial visible portion of the article. The goal is to take every other piece of work of the browsers plate until these images + the first css file get injected. So I WAIT for that to complete before initiating any further image read/injecting. I have currently set the "above the fold" number to 1 to minimize time spent as much as possible. But it can be higher for similar results and we can decide what it should be after more testing. I have also reverted to using the iframe instead of the div. So css should load and scroll to top should work. Code has to be cleaned up but unless you guys spot some issues I think it's ready to be merged. |
OK, that's great, thanks @sharun-s . I'll test asap on my platforms. Can I just check, are you still using the existing jQuery code for injecting the CSS? |
@Jaifroid haven't change anything much in the css portion, other than to add a promise (cssLoaded), to track when the css file has completed it's loading. You can lookup where the cssLoaded is used to see how it controls when the workers get created. |
@sharun-s , I've tested on Edge, on Edge Mobile, and as a UWP app for PC and for mobile, and it's functioning very well -- congratulations! The images load extremely fast, even on mobile with a very heavy page from a 19Gb file. The CSS is still very slow to load on mobile, but a fair bit faster on PC. FYI, the blob approach works with CSS, but it is difficult to ensure the code is inserted into the HTML before the HTML is injected into the iframe, due to the (necessary) asynchronous design. Your promise approach with the CSS might help, but there'd necessarily be a bottleneck if working on the HTML string BEFORE it is injected. This will only be useful if it dramatically speeds up extraction of CSS. I really don't understand why extracting a little bit of ASCII from large multi-gigabyte ZIM file takes so much longer than extracting a dozen binary images from the same file, even before your speedups with the images. I can only think this is because the CSS is stored at the end of the ZIM file. If we know roughly where it is stored, maybe we can target the search. I don't understand the search algos well enough, but it seems you do. |
I have mostly been testing on Chrome. But I did a brief Firefox and Edge (on desktop) run and found things mysteriously much faster on Firefox and much slower on Edge. Have to look into what they are doing differently. Might throw up things that can be applied across the board. My feeling is CSS loading issues might not be a retrieval (i.e. read/lookup) bottleneck. Timing wise, have seen them take about the same time as image lookup+reads. More likely an injection/render issue. Need to test more though, to verify things. My focus has been more on the image side of things so far. And I think that CSS delay is more pronounced due to the double draw/reflow issue we discussed If the page suddenly changes style after a second or more the delay is very noticeable to the user. Good to hear the blob approach works. Will try it out over the weekend. When you do get a chance next, please save and send me the timing info printed out on the console. "TimeToFirstPaint" holds how much time it's taking for that first image and css load. Just to get an idea of the performance differences platform to platform. Anyways no big rush. Enjoy your trip! |
@mossroy Having ported @sharun-s 's DirEntry cache code to Kiwix JS Windows some time ago (it's been in since the first release of the app in the MS Store), I could very easily port it into Kiwix JS. From memory, it's virtually a couple of lines and it dramatically increases the speed of subsequent loads of any page (it doesn't affect the speed of the first page loaded), due to caching of the entries for stylesheets, and caching of the blobs (there are two possible cache optimizations, one of which I found more useful than the other). I ported this to Kiwix JS Windows in order to increase performance if a user disables the loading of cached assets. No rush of course, as this would be for v2.3, but thought I'd offer to do this, having had the experience of integrating it already. |
Thanks a lot for the offer @Jaifroid . It would indeed be worth porting this on kiwix-js, as the code is not very complicated.
|
I should also add that there had been an attempt in #183 to do similar things, (but the speed improvement was not worth it) |
Of the two, dirEntryCache and cssBlobCache, it is the latter that makes the most dramatic difference, because the css is very slow to load, and once cached (after first page load, sometimes after second or third for less-used CSS) it then loads instantly from the cache.
Commit 871836c implements the cssBlobCache. Of the two, dirEntryCache and cssBlobCache, it is the latter that makes the most dramatic difference, because the css is very slow to load, and once cached (after first page load, sometimes after second or third for less-used CSS) it then loads instantly from the blob cache. Thanks to @sharun-s for the underlying code. Tested and working in the FFOS simulator. Note that I've left console log entries in so that you can see what it's doing (when the cache is hit). VERY IMPORTANT: in this prototype code (for testing) I'm only clearing the cache on loading a ZIM with the File Picker here . However, it will be very very important to identify where Kiwix JS running on FFOS / Ubuntu phone / other packaged apps loads a new ZIM (without using the FIle Picker) and add in a copy of the cache clearing code at that point too. As you know the ins and outs of the FFOS app, can you take a look at where else this code should be added, @mossroy ? The cache doesn't persist between sessions afaik (well, that might need testing on FFOS -- it doesn't persist between browser sessions). But it must be cleared on loading a new ZIM. |
PS the branch for testing is: https://github.com/kiwix/kiwix-js/tree/Backport-CSSCache |
PPS As the comment in the code suggests, it may be sufficient to insert the cache clearing code in setLocalArchiveFromArchiveList , but I've fiddled too much with that part of the code in Kiwix JS Windows to be confident of putting it in the right place for FFOS. |
Hmm, I spoke too soon about it working in FFOS. It appears to work, but it's having trouble loading CSS refs that look like this:
I'm getting this error in the console:
No such error running this code in Firefox. Any thoughts on what the problem is in FFOS, @mossroy ? |
OK, I think FFOS is seeing blob: URLs as remote URLs. According to https://developer.mozilla.org/en-US/docs/Archive/B2G_OS/Firefox_OS_apps/Building_apps_for_Firefox_OS/CSP , on FFOS:
So it seems we'd have to cache the raw data rather than a URL to support FFOS. Pity, because it complicates the code a lot... Does this seem like a reasonable guess as to what is going on @mossroy ? Because the blob URLs are exactly the same as for images, and those definitely work on FFOS. Basically, this works:
and this doesn't (only on FFOS):
Grrr. |
Turned out to be a lot easier to cache the raw CSS than I thought it would be. See ca88b37 . This now works on the FFOS simulator and the speedup is quite remarkable for subsequent pages (after first). Remember this code is still missing the lines needed to clear the cache on load of a new ZIM in FFOS (it clears the cache on load of a new ZIM with the file picker). Maybe you could insert that at the appropriate place as part of your testing, @mossroy . I'd rather test against the branch and be sure everything is OK technically before doing a PR. |
Thanks a lot for that branch @Jaifroid . |
This issue was addressed a long time ago. Closing. |
To improve performance
The text was updated successfully, but these errors were encountered: