Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a low level block cache #658

Merged
merged 40 commits into from
Nov 5, 2020
Merged

Add a low level block cache #658

merged 40 commits into from
Nov 5, 2020

Conversation

Jaifroid
Copy link
Member

As discussed in #595, this PR implements #619 and revives #183. It is almost entirely @peter-x's code. I have simply tuned the cache size based on my experience running this in KJSW, and have added a faster method of reading an arrayBuffer from the updated Blob API that elminates the need to use FileReader (for those browsers that support the new method).

Apart from Edge (Chromium), I have tested very quickly on Firefox OS simulator (working) and on IE11 (working).

I have left in a console log output that shows the block cache hits and misses, as well as the percentage hit rate. Below is from IE11 console log when running the full 96GB English Wikipedia and searching for "the second world war". You can clearly see here the number of file reads that are avoided and an impressive 100% hit rate for some 6,000 dirEntry reads, and above 74% for the initial 3,400 reads.

image

@Jaifroid Jaifroid added this to the v3.1 milestone Oct 17, 2020
@Jaifroid Jaifroid requested a review from mossroy October 17, 2020 11:29
@Jaifroid Jaifroid self-assigned this Oct 17, 2020
@Jaifroid
Copy link
Member Author

Hmm. QUnit tests are failing this version on several browsers. I revert to draft while I investigate. It works in browsers outside the test environment.

@Jaifroid Jaifroid marked this pull request as draft October 17, 2020 11:58
@Jaifroid
Copy link
Member Author

@javgh Feel free to comment on this PR since you have also integrated peter-x's block cache into your version. If there are any optimizations to the cache you have implemented that could be useful here, please discuss.

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 17, 2020

It's failing on reading an article split across two different split ZIM file parts. Interesting, because I use split ZIMs with this code quite extensively, and have not noticed that failure before. Possibly because the split part fragments on Ray Charles are so small. But good that the tests found that error.

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 18, 2020

OK, I'm pretty sure I've discovered the issue with split files. The block cache code doesn't have logic to detect when it is reading beyond the end of a split ZIM chunk, which is probably a vanishingly small evenet with large split ZIM archives, but because our Ray Charles test is split into tiny 96K parts, it happens with some regularity. Hence, I've never detected this issue before.

IMO, the cache code is patching file slice reads at the wrong level. Instead of patching util.readFileSlice (which is the level at which individual reads of split parts of a ZIM happen, to be concatenated later), it should patch zimFile.readSlice, and the logic to split reads across different ZIM parts should be included in the cache code.

I think this is relatively easy to do. It will also simplify the cache, as it will store relative offsets from the start of the ZIM file set instead of absolute offsets from the start of each split ZIM part. It would then be indifferent to how small split ZIM chunks are.

@Jaifroid Jaifroid marked this pull request as ready for review October 18, 2020 17:30
@Jaifroid Jaifroid marked this pull request as draft October 18, 2020 17:37
@Jaifroid
Copy link
Member Author

Although it's now passing tests, I want to do some cleanup of the location of some of the code.

@kelson42 kelson42 linked an issue Oct 18, 2020 that may be closed by this pull request
@Jaifroid Jaifroid marked this pull request as ready for review October 18, 2020 20:00
@Jaifroid
Copy link
Member Author

I have implemented the suggestions in my previous post, except that I left the logic for dealing with split ZIM files as a dedicated function in zimfile that the cache can use, rather than putting it wholly in the cache code.

I have cleaned up and documented the code (I documented the cache code, but also added missing documentation to zimfile), and have tested the last commit on Edge Chromium, Firefox OS Simulator, IE11 and Edge Legacy.

The console log entry that measures the block cache hit rate displays only once every 2,000 cache requests. For searches the hit rate is very high, often near 100%. For randomly cycling through a ZIM, the hit rate is 60 - 75%, higher for smaller ZIM archives.

I would suggest leaving that console log entry in the code as it is a useful metric that could help future devs to tune the block cache to modern devices. However, I can remove it or comment it out if that is preferred.

The code is ready for review.

@kelson42
Copy link
Collaborator

@Jaifroid Congratulation for the super fast step forward. What is the memory consumption of this cache? How is that different from @javgh's cache system?

@Jaifroid
Copy link
Member Author

@kelson42 It's the same underlying cache system, originally developed by peter-x but never merged. I have corrected the logic for dealing with split archives, and have documented the code fully, and tuned the size of the cache based on my experience with Kiwix JS Windows. I believe @javgh has tuned their code to deal with network requests, so the cache is probably much larger. However, there are constants at the head of the cache file that can easily change the size of the cache and of the blocks read, and these could easily be turned into parameters that a dev (or a user) could set based on usage requirements.

As configured in this PR, the maximum cache size would be 4000 * 4096 bytes = 16,384,000 bytes. I use these values in KJSW, and the code can run on a low-end VM of Windows Mobile with 512MB RAM. If the cache is too big for @mossroy's physical FFOS device (though I doubt it will be an issue, based on the FFOS simulator), then we can try 2000 * 4096 or 2000 * 2048.

www/js/lib/zimfile.js Outdated Show resolved Hide resolved
www/js/lib/zimfile.js Outdated Show resolved Hide resolved
@mossroy
Copy link
Contributor

mossroy commented Oct 21, 2020

I tested on my Firefox OS device : it works with no problem. So the cache size looks compatible with such old hardware.
The debug logs usually give very good hit/miss ratios.

But, surprisingly, I don't measure any performance difference on this device, at least on the time to first appearance of a wikipedia article. I tested on big articles like "Paris" on wikipedia, and on the home page : it takes the same time on this branch and on master (although I see the good hit/miss ratio in the browser log, when using the branch).

Anyway, if you measure significant improvements on other devices, we should not bother about Firefox OS devices performance any more.
I'll have a look at the code, which looks good at first sight

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 21, 2020

@mossroy This cache does not make much difference with a single article load time. The cache is designed to speed up binary search for dirEntries where we can easily get around 10,000 file reads for a single search (see openzim/libzim#430 (comment)). Without this PR, each of those is a file read; with this PR, about 80% are taken from the cache, and 100% for repeated searching.

So, where you will see a really big difference is if you do search on a device with slow disk access, or, as in #595 and #659, accessing file slices over a network. Favourite searches I use to test this in full English Wikipedia are: "the second world war", or "unesco world heritage site" (the latter requires a lot of searching till it discovers the all caps "UNESCO" with other caps combinations.

Binary search of course also affects loading an article with a lot of assets, because a search has to be undertaken for each asset (obviously more in SW mode than in jQuery mode).

It should be noted that the solution here is a version of what is being worked on in openzim/libzim#430 and also openzim/libzim#424 . As explained there, it will become even more important once ZIMs without namespaces are generally available, becuase then binary search will not be easy to limit by the namespace.

I see a really big increase in speed on any device with slow file access. IE11 search is massively sped up. Running the app on Android makes it usable in Chrome/Edge even with full English Wikipedia (it is completely unusable without this cache). Even Chrome on a PC has relatively slow file access.

However, the ONE browser that has very fast file access is.... Firefox (we already noted that on Android in #581). As a result, you won't notice much speedup in Firefox, since file access is not a bottleneck in that browser (and maybe it does its own caching to achieve that: it certainly attempts to do that on Android, which then fails with ZIMs larger than 2GB, at least on my phone).

There is a trade-off between disk-access speed and processor speed. This PR speeds up those browsers with relatively slow file access compared to faster processor speed. It is unsurprising on an old Firefox device where I guess file access is fast and processor speed is slow that this code will not provide an improvement: it is not addressing the bottleneck in those devices (processing speed).

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 21, 2020

Just a further point: you will see in the code that the cache is tuned not to cache file reads that are larger than twice the block size, to avoid clogging the cache. So it will not cache the contents of an article, or even of assets for the most part (and we have a specialized assets cache for that purpose anwyay). What this cache specializes in is dirEntries that are repeatedly and uselessly read from file in the binary search process (thousands and thousands of them, and for the initial hops always the same ones).

@mossroy
Copy link
Contributor

mossroy commented Oct 23, 2020

You're right : the search speed is significantly improved on my Firefox OS device with this branch.
I now can barely see the results progressively displaying in the list : they appear almost always all at the same type, and much faster.

www/js/lib/filecache.js Outdated Show resolved Hide resolved
www/js/lib/filecache.js Outdated Show resolved Hide resolved
@mossroy
Copy link
Contributor

mossroy commented Nov 1, 2020

Don't bother more on the "not re-initializing cache" topic.

Regarding the cache implementation, you also could have a look at existing ones (and even use them instead of our current) : https://gist.github.com/dashsaurabh/bf7a02e85d325743076ba9943a286e9f , https://blog.lmerza.com/2019/01/24/lfu-and-lru-caching-in-javascript/ , https://github.com/monsur/jscache . They're not using a Map, though

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 2, 2020

Thanks for those examples and theory of LRU and LFU caches. I followed the example from https://gist.github.com/dashsaurabh/bf7a02e85d325743076ba9943a286e9f and the Cache is now deleting the least frequently used entry correctly. It seems peter-x's implementation hadn't quite completed the logic of unlinking and deleting the lfu entry.

Still in draft, because I need to trace each step through carefully to be sure it's right. The linked lists are a bit mind-bending, and I can't help thinking that an easier implementation should be possible with a Map, because a Map preserves the order of insertion of entries, meaning we wouldn't have to maintain the linking lists and reset the links. However, I don't feel I understand the theory thoroughly enough to re-do it myself in Map format.

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 2, 2020

Having said that, I find someone has explained how to do it with JS Map - and it's along the lines I was thinking. Apparently it's a common interview question... I'll review this later this week, because it has the potential to simplify quite radically:

https://markmurray.co/blog/lru-cache/

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 3, 2020

Quick update: latest commit refactors to use a simple Map-based implementation. It works very well (and fast) except in IE11, where the Map.keys() iterator is not supported (used to find the first = oldest key quickly in order to delete it once the cache is full). Therefore we will either have to find another way to iterate just part of the Map to get the first (oldest) value -- forEach is supported but we wouldn't want to loop over the full 4000 Map entries, and there is no way of breaking a forEach loop -- or else we'll have to keep track of the first entry, which means keeping the linking code. I tried for ... of but IE doesn't support it either, and for ... in doesn't work with Maps. There must be a low-level way (in IE11) to get entries out of a Map when we don't know the key.

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

OK, I've added IE11 support. The only way of getting the first key from the Map in IE11 is to do a forEach loop over all 4,000 map entries. However, I've pared down the processing to a minimum, so that each loop iteration simply returns instantly once the first (=oldest) value has been found, and I have tested with performance.now(). The full iteration of 4000 entries takes an average of 1 millisecond (sometimes 0.6 milliseconds) in IE11 on my PC. So, to a human, this will not be noticeable. Once the cache is full, though, the loop is run for every read that is a cache miss, say an average of 40% of file reads. I don't notice a performance drop.

There is an alternative: once the cache is full we could empty it instantly in IE11 only. It takes a lot of full Wikipedia queries and page loads to fill the 4,000 cache entries, so the cache would be being emptied rarely.

See what you think. The current loop can be seen in commit f22aaa6.

@Jaifroid Jaifroid marked this pull request as ready for review November 4, 2020 01:08
@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

EDIT: I had accidentally hard-coded 1000 deletes (25%) into the code, so despite the mention of "400" in the highlighted console.logs below, this algorith was actually deleting 25% at a time. I have tested with 10%, and it runs quite frequently, and seems to depress the hit rate more generally throughout the cycle, so I suggest leaving at 25%.

ORIGINAL POST: I've thought of a solution for IE11: instead of iterating the Map 4,000 times for each Cache.store operation in order to delete one entry, and instead of clearing the entire Map, we can simply clear the oldest 10% of the entries. This means that we don't hit the Cache too hard, and we don't use excessive CPU once the Cache is full. I have run some tests in IE11, and below is the output showing how many entries were cleared and when in the read cycle. As can be seen, there's a slight losss of hit / miss ratio immediately after a delete operation, but it doesn't then run again for several article loads (unless they are huge articles), and the loss seems pretty acceptable.

image

(Tests done with full English Wikipedia)

@mossroy
Copy link
Contributor

mossroy commented Nov 4, 2020

This Map version works fine on my Firefox OS device.

Regarding the IE workaround, I don't see why f22aaa6 is so bad : it will exit after the first loop, so should not iterate over all the cache entries? Is it because calling forEach is slow?
In any case, it's not a big issue if IE11 is slower. What is important is that this PR does not reduce its performance. If this PR does not enhance its performance as much as other browsers, don't bother too much on optimizing it

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

I don't see why f22aaa6 is so bad : it will exit after the first loop, so should not iterate over all the cache entries? Is it because calling forEach is slow?

Unfortunately, it will still run through all 4,000 entries, even if it only exexutes if (c > q) return; 3,999 times and does useful work only on the very first entry. There is no break for forEach loops (there is a hacky way to break the loop by causing an exception within a try-catch block, but it's "very bad" to use exceptions for programme flow, so I read!). It kind of offends my sensibilities to write something so inefficient, even if the slowdown is trivial on a modern CPU. Remember that this 4,000 churn will execute for every read of a small amount of data not already in the cache, and that can be thousands of reads.

I thnk deleting 25% of the Cache in one go gets round this churn without impacting IE11 performance., The loop will only ever run once, and then wait till the Cache is filled with another 1,000 entries before running agiain. This is implemented in the latest commits.

This is ready for final testing / review. I never thought it would be a complex PR, but JS is never easy!

@mossroy
Copy link
Contributor

mossroy commented Nov 4, 2020

OK I understand.
I found a polyfill for IE and the Map feature : https://github.com/zloirock/core-js#map. But it might be overkill for the need

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

I found a polyfill for IE and the Map feature : https://github.com/zloirock/core-js#map. But it might be overkill for the need

I think there's a good argument for using babel/core-js, which would allow us to write ES6 (or its superset, Typescript) and compile to compatible code without worrying so much about workarounds for specific browsers (mostly IE, but not only).

But I think this needs to be done as part of #554 (which presupposes #367, and also removing Q, because core-js handles Promise support as well, including modern features such as async).

As I understand it (but I don't have any direct experience), automatic code analysis is done by babel, which then attaches the core-js polyfills required by the code based on the language features used.

Copy link
Contributor

@mossroy mossroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's working fine on my Firefox OS device, and also on Firefox 82 and Chromium 86.

I think it was a really good choice to use a Map and its ability to keep the order of keys : it makes the code more simple (and probably faster).
It might have been a bit faster on IE11 to use an Object (instead of the Map, like it was done in earlier versions of this PR), because of its lack or .keys(). But it would be too much code to maintain for a very deprecated browser : I prefer your workaround.

www/js/lib/filecache.js Show resolved Hide resolved
@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

It might have been a bit faster on IE11 to use an Object (instead of the Map

The issue is that, because Objects do not guarantee the order of entries (at least in older browsers), it is necessary to maintain complex lists linking the entries in order of insertion (which is what the versions of the LRU Cache that use Objects implement). And as you say that would have been a lot of extra code to maintain for one browser.

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

Tested on IE11 and FFOS Simulator in jQuery mode, and on Fiefox 81.0.2, Edge Legacy, Edge Chromium 86.0, in jQuery and SW modes. I still need to test on Windows Mobile.

EDIT: Tested on Windows 10 UWP and Windows 10 Mobile simulator in jQuery mode. Also tested on the experimental Windows 10 UWP PWA version (runs in an Edge Legacy web view, and has access to UWP API) in SW mode.

FURTHER EDIT: Tested also on Android PWA (via Kiwix JS Windows, using same code as this PR).

All working fine.

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 4, 2020

@mossroy I have completed my testing. I'll squash and merge tomorrow (Thursday) afternoon unless I hear otherwise.

@Jaifroid Jaifroid merged commit 158024d into master Nov 5, 2020
@Jaifroid Jaifroid deleted the Add-a-low-level-block-cache branch November 5, 2020 18:07
Jaifroid added a commit that referenced this pull request Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider adding dirEntry and block cache
3 participants