-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a low level block cache #658
Conversation
Hmm. QUnit tests are failing this version on several browsers. I revert to draft while I investigate. It works in browsers outside the test environment. |
@javgh Feel free to comment on this PR since you have also integrated peter-x's block cache into your version. If there are any optimizations to the cache you have implemented that could be useful here, please discuss. |
It's failing on reading an article split across two different split ZIM file parts. Interesting, because I use split ZIMs with this code quite extensively, and have not noticed that failure before. Possibly because the split part fragments on Ray Charles are so small. But good that the tests found that error. |
OK, I'm pretty sure I've discovered the issue with split files. The block cache code doesn't have logic to detect when it is reading beyond the end of a split ZIM chunk, which is probably a vanishingly small evenet with large split ZIM archives, but because our Ray Charles test is split into tiny 96K parts, it happens with some regularity. Hence, I've never detected this issue before. IMO, the cache code is patching file slice reads at the wrong level. Instead of patching I think this is relatively easy to do. It will also simplify the cache, as it will store relative offsets from the start of the ZIM file set instead of absolute offsets from the start of each split ZIM part. It would then be indifferent to how small split ZIM chunks are. |
Although it's now passing tests, I want to do some cleanup of the location of some of the code. |
I have implemented the suggestions in my previous post, except that I left the logic for dealing with split ZIM files as a dedicated function in zimfile that the cache can use, rather than putting it wholly in the cache code. I have cleaned up and documented the code (I documented the cache code, but also added missing documentation to zimfile), and have tested the last commit on Edge Chromium, Firefox OS Simulator, IE11 and Edge Legacy. The console log entry that measures the block cache hit rate displays only once every 2,000 cache requests. For searches the hit rate is very high, often near 100%. For randomly cycling through a ZIM, the hit rate is 60 - 75%, higher for smaller ZIM archives. I would suggest leaving that console log entry in the code as it is a useful metric that could help future devs to tune the block cache to modern devices. However, I can remove it or comment it out if that is preferred. The code is ready for review. |
@kelson42 It's the same underlying cache system, originally developed by peter-x but never merged. I have corrected the logic for dealing with split archives, and have documented the code fully, and tuned the size of the cache based on my experience with Kiwix JS Windows. I believe @javgh has tuned their code to deal with network requests, so the cache is probably much larger. However, there are constants at the head of the cache file that can easily change the size of the cache and of the blocks read, and these could easily be turned into parameters that a dev (or a user) could set based on usage requirements. As configured in this PR, the maximum cache size would be 4000 * 4096 bytes = 16,384,000 bytes. I use these values in KJSW, and the code can run on a low-end VM of Windows Mobile with 512MB RAM. If the cache is too big for @mossroy's physical FFOS device (though I doubt it will be an issue, based on the FFOS simulator), then we can try 2000 * 4096 or 2000 * 2048. |
I tested on my Firefox OS device : it works with no problem. So the cache size looks compatible with such old hardware. But, surprisingly, I don't measure any performance difference on this device, at least on the time to first appearance of a wikipedia article. I tested on big articles like "Paris" on wikipedia, and on the home page : it takes the same time on this branch and on master (although I see the good hit/miss ratio in the browser log, when using the branch). Anyway, if you measure significant improvements on other devices, we should not bother about Firefox OS devices performance any more. |
@mossroy This cache does not make much difference with a single article load time. The cache is designed to speed up binary search for dirEntries where we can easily get around 10,000 file reads for a single search (see openzim/libzim#430 (comment)). Without this PR, each of those is a file read; with this PR, about 80% are taken from the cache, and 100% for repeated searching. So, where you will see a really big difference is if you do search on a device with slow disk access, or, as in #595 and #659, accessing file slices over a network. Favourite searches I use to test this in full English Wikipedia are: "the second world war", or "unesco world heritage site" (the latter requires a lot of searching till it discovers the all caps "UNESCO" with other caps combinations. Binary search of course also affects loading an article with a lot of assets, because a search has to be undertaken for each asset (obviously more in SW mode than in jQuery mode). It should be noted that the solution here is a version of what is being worked on in openzim/libzim#430 and also openzim/libzim#424 . As explained there, it will become even more important once ZIMs without namespaces are generally available, becuase then binary search will not be easy to limit by the namespace. I see a really big increase in speed on any device with slow file access. IE11 search is massively sped up. Running the app on Android makes it usable in Chrome/Edge even with full English Wikipedia (it is completely unusable without this cache). Even Chrome on a PC has relatively slow file access. However, the ONE browser that has very fast file access is.... Firefox (we already noted that on Android in #581). As a result, you won't notice much speedup in Firefox, since file access is not a bottleneck in that browser (and maybe it does its own caching to achieve that: it certainly attempts to do that on Android, which then fails with ZIMs larger than 2GB, at least on my phone). There is a trade-off between disk-access speed and processor speed. This PR speeds up those browsers with relatively slow file access compared to faster processor speed. It is unsurprising on an old Firefox device where I guess file access is fast and processor speed is slow that this code will not provide an improvement: it is not addressing the bottleneck in those devices (processing speed). |
Just a further point: you will see in the code that the cache is tuned not to cache file reads that are larger than twice the block size, to avoid clogging the cache. So it will not cache the contents of an article, or even of assets for the most part (and we have a specialized assets cache for that purpose anwyay). What this cache specializes in is dirEntries that are repeatedly and uselessly read from file in the binary search process (thousands and thousands of them, and for the initial hops always the same ones). |
You're right : the search speed is significantly improved on my Firefox OS device with this branch. |
Don't bother more on the "not re-initializing cache" topic. Regarding the cache implementation, you also could have a look at existing ones (and even use them instead of our current) : https://gist.github.com/dashsaurabh/bf7a02e85d325743076ba9943a286e9f , https://blog.lmerza.com/2019/01/24/lfu-and-lru-caching-in-javascript/ , https://github.com/monsur/jscache . They're not using a |
Thanks for those examples and theory of LRU and LFU caches. I followed the example from https://gist.github.com/dashsaurabh/bf7a02e85d325743076ba9943a286e9f and the Cache is now deleting the least frequently used entry correctly. It seems peter-x's implementation hadn't quite completed the logic of unlinking and deleting the lfu entry. Still in draft, because I need to trace each step through carefully to be sure it's right. The linked lists are a bit mind-bending, and I can't help thinking that an easier implementation should be possible with a Map, because a Map preserves the order of insertion of entries, meaning we wouldn't have to maintain the linking lists and reset the links. However, I don't feel I understand the theory thoroughly enough to re-do it myself in Map format. |
Having said that, I find someone has explained how to do it with JS Map - and it's along the lines I was thinking. Apparently it's a common interview question... I'll review this later this week, because it has the potential to simplify quite radically: |
Quick update: latest commit refactors to use a simple Map-based implementation. It works very well (and fast) except in IE11, where the |
OK, I've added IE11 support. The only way of getting the first key from the Map in IE11 is to do a There is an alternative: once the cache is full we could empty it instantly in IE11 only. It takes a lot of full Wikipedia queries and page loads to fill the 4,000 cache entries, so the cache would be being emptied rarely. See what you think. The current loop can be seen in commit f22aaa6. |
EDIT: I had accidentally hard-coded 1000 deletes (25%) into the code, so despite the mention of "400" in the highlighted console.logs below, this algorith was actually deleting 25% at a time. I have tested with 10%, and it runs quite frequently, and seems to depress the hit rate more generally throughout the cycle, so I suggest leaving at 25%. ORIGINAL POST: I've thought of a solution for IE11: instead of iterating the Map 4,000 times for each Cache.store operation in order to delete one entry, and instead of clearing the entire Map, we can simply clear the oldest 10% of the entries. This means that we don't hit the Cache too hard, and we don't use excessive CPU once the Cache is full. I have run some tests in IE11, and below is the output showing how many entries were cleared and when in the read cycle. As can be seen, there's a slight losss of hit / miss ratio immediately after a delete operation, but it doesn't then run again for several article loads (unless they are huge articles), and the loss seems pretty acceptable. (Tests done with full English Wikipedia) |
This Regarding the IE workaround, I don't see why f22aaa6 is so bad : it will exit after the first loop, so should not iterate over all the cache entries? Is it because calling |
Unfortunately, it will still run through all 4,000 entries, even if it only exexutes I thnk deleting 25% of the Cache in one go gets round this churn without impacting IE11 performance., The loop will only ever run once, and then wait till the Cache is filled with another 1,000 entries before running agiain. This is implemented in the latest commits. This is ready for final testing / review. I never thought it would be a complex PR, but JS is never easy! |
OK I understand. |
I think there's a good argument for using But I think this needs to be done as part of #554 (which presupposes #367, and also removing Q, because core-js handles Promise support as well, including modern features such as As I understand it (but I don't have any direct experience), automatic code analysis is done by babel, which then attaches the core-js polyfills required by the code based on the language features used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's working fine on my Firefox OS device, and also on Firefox 82 and Chromium 86.
I think it was a really good choice to use a Map
and its ability to keep the order of keys : it makes the code more simple (and probably faster).
It might have been a bit faster on IE11 to use an Object (instead of the Map
, like it was done in earlier versions of this PR), because of its lack or .keys(). But it would be too much code to maintain for a very deprecated browser : I prefer your workaround.
The issue is that, because Objects do not guarantee the order of entries (at least in older browsers), it is necessary to maintain complex lists linking the entries in order of insertion (which is what the versions of the LRU Cache that use Objects implement). And as you say that would have been a lot of extra code to maintain for one browser. |
Tested on IE11 and FFOS Simulator in jQuery mode, and on Fiefox 81.0.2, Edge Legacy, Edge Chromium 86.0, in jQuery and SW modes. I still need to test on Windows Mobile. EDIT: Tested on Windows 10 UWP and Windows 10 Mobile simulator in jQuery mode. Also tested on the experimental Windows 10 UWP PWA version (runs in an Edge Legacy web view, and has access to UWP API) in SW mode. FURTHER EDIT: Tested also on Android PWA (via Kiwix JS Windows, using same code as this PR). All working fine. |
@mossroy I have completed my testing. I'll squash and merge tomorrow (Thursday) afternoon unless I hear otherwise. |
As discussed in #595, this PR implements #619 and revives #183. It is almost entirely @peter-x's code. I have simply tuned the cache size based on my experience running this in KJSW, and have added a faster method of reading an arrayBuffer from the updated Blob API that elminates the need to use FileReader (for those browsers that support the new method).
Apart from Edge (Chromium), I have tested very quickly on Firefox OS simulator (working) and on IE11 (working).
I have left in a console log output that shows the block cache hits and misses, as well as the percentage hit rate. Below is from IE11 console log when running the full 96GB English Wikipedia and searching for "the second world war". You can clearly see here the number of file reads that are avoided and an impressive 100% hit rate for some 6,000 dirEntry reads, and above 74% for the initial 3,400 reads.