-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add native asset caching to Service Worker #556
Conversation
I suppose this addresses #414, not #441 (wrong link in the description). It's hard to see the changes in github, that tends to say that everything has been modified. I quickly tested on Firefox, and it seems to work well too (I see the cache content in devtools, and the log properly says it uses it). This is promising.
It fails because we can not The Cache API is not available on all platforms, see https://developer.mozilla.org/en-US/docs/Web/API/Cache#Browser_compatibility. I think we don't have any supported platform that supports SW, but not Cache API. But, to be cleaner, it would be better to test for this API availability before using it. If it's not available, we can simply fallback to standard behavior (without cache), and put an information message in the console. Regarding the cache eviction strategy, things are quite open. But we also might activate the cache for some small images (that could be useful for some icons), for which a specific eviction strategy could be used (with a smaller available space). But well, that could be for later, and maybe too complicated for the need. |
I'm pleased you're interested in this, @mossroy ! It really does make a relatively easy performance jump in SW mode, while we wait for #116. Up till now, jQuery mode has been much faster than SW mode, due to the fact that we have memory caching for jQuery, but none for SW (and also because SW decompresses scripts). This equalizes things, and makes using SW mode much more feasible for daily use. It's a real shame that the Chrome Extension schema doesn't work with Cache, as Chrome Extension is a major use-case for this app. I wonder if there's a solution for that, or whether it's just outright not supported. It's worth investigating. Regarding an interface to control the Cache, I did develop one when I worked on a comprehensive cache based on indexDB, and it looks like the image below. This could be adapted to our needs. I didn't know libzim comes with a cache. But it'll take some time for that code to be developed, so I think it's worth using this optimization. I agree PhET is a problem. JS is unique to each experiment and shouldn't be cached. I'm sure we can find a way to distinguish. Thanks for the detailed and thoughtful comments, which all seem very sensible. |
I've checked support in chrome-extension, and unfortunately it's not supported full stop. Chrome-extension has its own caching strategy that devs can use via Because localStorage is synchronous, it cannot be accessed from a Service Worker. IndexedDB is, however, asynchronous and apparently can be used inside Service Worker. So the question is: is it worth adding a fallback of indexedDB caching in SW specifically for Chrome extension usage? I have a ready-made indexedDB function. It should be noted that the Cache API does not need to be accessed from the SW. It is also available in the window object, and can be accessed from the main thread. However, it seemed to me to be neater to keep it all in the SW (also running on a separate process, so maybe faster). |
For now, I've added a (volatile) memory cache ( To test, see the comment in the new code (commit 5f37306) about changing |
Ah, the dreaded Firefox 45 Travis error... |
No, it's not necessary to add anything when inside the chrome-extension scheme, because Chromium already has a built-in in-memory cache (see #411 (comment)) when inside an extension. I think we simply need to properly avoid the console error messages in this case |
Ah OK, I'd forgotten about that comment (haven't used Chrome extension much). So should I remove the memory chache I've added in the last commit? (It only took me 30mins or so to add it, so no great loss if you don't see a use for it.) The only use I can think of might be to speed up cache access generally. |
What I mean by that is that, when I was experimenting with IndexedDB last year, I found it was noticeably slower to access assets stored there than from cssCache. So I opted to keep cssCache for fast retrieval, and use IndexedDB to store assets between sessions. But I'm not sure it's the case with Cache API. It seems "fast enough". |
IMHO you should remove this last commit and only keep the Cache API (without introducing IndexedDB either) |
Done. I've kept the structural improvements to the code: providing the relevant regexes at the top of the code with a comment for developers on what they do. It's better than having regexes buried further down the code which would be difficult to maintain. NB this code excludes |
Just to say that it would be best for me to delay working on adding a cache control panel (like that above) until we've merged #527, because otherwise I'd have to do double work (Bootstrap 3 panels are rather different from Bootstrap 4 cards). But in due course it would be good to know whether that design seems right, and whether you'd be interested in having the "Return to last visited page when you open an archive" functionality. That functionality is useful, for example, in a NWJS app. It's less useful if the user still has to pick the archive, which is the case in most contexts. The Cache used field would show "Memory cache" for jQuery mode and "Cache API" for SW mode. |
I had some time yesterday evening, so added the cache evacuation infrastructure and UI anyway. The checkbox for remembering the last page between sessions is currently non-functional and can be removed if that functionality is not required, in which case the panel title would change from "Performance and privacy settings" to "Performance settings". Asset counting works for both the memory cache and Cache API, but only the assets in the respective cache being used are shown. However, on deleting assets, the sum of all assets deleted is displayed. Users might end up with both Cache API assets and memory cache assets if they switch from SW to jQuery mid-session, but clearing the caches will clear them both. |
As a side issue, the "random" Firefox 45 Travis failures must I guess be caused by a race condition somewhere in our code. Is there a way to debug Travis builds? I suppose the Chrome failure for #527 is also a race condition (but a different one). |
It would be better to discuss remembering the last page in a different PR |
OK, I've added issue #561 for this. |
8152392
to
a0da824
Compare
a0da824
to
ed6c8a6
Compare
Choose the wording you prefer, no problem. About the cache status, my goal is mainly to avoid making the code more complicated, duplicated and take the risk to give a wrong information to the user. Having an option to disable the cache can still be a good idea for debugging purpose (or in some edge cases), but it's still an "expert setting", and could be moved in the "expert settings" zone. I understand your concern about code diverging between kiwix-js and kiwix-js-windows.
When some API is unavailable, or unusable, you could simply put a "N/A" instead of the asset count Regarding PhET, it's a good use case to test what happens when many assets are put in the cache (at some point, the browser might ask if the user wants to grow the cache or not). But we must not do anything that is PhET specific. Based on https://developers.google.com/web/fundamentals/instant-and-offline/web-storage/offline-for-pwa , the browsers have their own safe-guards when too much cache is used : it might be enough to rely on them. |
OK, I'll have a think about what can be done to simplify. As it is, we currently have three different caching strategies: "Memory", "Cache API", and "Custom" (Chrome-Extension, which is a black box). Unless we go for some automatic cache evacuation strategy, I'd like users to be aware that items are being cached potentially for a long time in the case of Cache API and that they can end up taking significant space. |
PS This system is not implemented in Kiwix JS Windows yet. My plan is to replace the file-based caching currently used there (hard to maintain) with this automatic one (much easier to maintain), when it's ready. But I also have to provide indexedDB for jQuery mode because of the lack of general support for Service Worker in UWP JS apps (SW support only exists for content served from https: , i.e. if I implement it as a PWA inside UWP). |
So this is my best attempt to prevent duplication of variables, regexes, and functions across app.js and SW, and to simplify the logic. The latest code works like this:
I think this simplifies the decision logic significantly. However, the amount of code in app.js has not decreased by much, because messaging the Service Worker and receiving a response is not completely trivial. The big benefit is that we no longer have to tell DEVs to ensure that duplicated CACHE definitions and regexes have to be kept in sync, because there is no duplication. So the main benefit is maintainability. Please let me know what you think @mossroy. PS If testing, please force refresh, then simple refresh your browser to be sure you've loaded the new Service Worker code, or else unregister the old SW in dev tools, exit browser and restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this refactoring. This removes all the duplicated code, which is great.
I've made a few comments but it looks good.
As we have an "API status" zone, maybe the availability of Cache API could be displayed there? (it's minor too)
service-worker.js
Outdated
* The value is defined in app.js and will be passed to Service Worker on initialization (to avoid duplication) | ||
* @type {String} | ||
*/ | ||
var CACHE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable might be renamed to CACHE_NAME or something similar, to avoid assuming it's where the cache is stored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do.
service-worker.js
Outdated
fetchCaptureEnabled = false; | ||
} | ||
} | ||
if (event.data.useCache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to use the same way of passing information between app.js and the SW.
I initially used the event.data.action
variable to distinguish each kind of message app.js passes to the SW, while you test here on other variables.
Both ways are perfectly valid, but it would be better to use the same.
This is minor, and can be dealt with in another issue/PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider that, but the problem was (if I remember correctly) that event.data.action is a string (or at least it is tested as a string) and I wanted a psuedo-Boolean. However, I can think of a solution that uses event.data.action without having to have two code blocks, so will try that, and if it's easy, will do it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really not crucial, and my initial implementation might not be the best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really not crucial, and my initial implementation might not be the best
It turns out to be very easy to do (I'm about to push a commit with it) and it does make sense to have all "actions" under this key.
Commit dd18a67 implements what was discussed in code review. Given that some changes arose from a small misunderstanding over return types, what's your view on whether a Promise should always be described as a |
Last question: should I remove console.log lines (not console.error) from the code before merging, or is it useful to have them for a while and remove them later? |
It seems the syntax |
Maybe we could keep the |
After a bit more testing (including in Chrome extension, but not Firefox extension because I don't know how), can I squash and merge? |
Yes you can |
This addresses
#441#414. It adds the native Cache API toservice-worker.js
for the caching of css and js assets. It significantly speeds up SW mode. The cached entries can be checked in the Cache entry of devtools in Chromium or Edge (I didin't try with Firefox). Example of running this on the most recent Ray Charles is shown in the image below (from Chromium / Edge 78). This is not for merging yet, but I would appreciate testing and any comments.I think the main TODO is to provide a mechanism for deleting / pruning entries. We could just do it on change of ZIM, but as the quota on most devices is quite generous, and as the entries are unique to each ZIM, a more subtle mechanism could be found. We could provide usage statistics on the Configuration page and a button to clear the cache if it is quite large, for example. This would need to be decided.
Info for testing: To delete the cache and Service Worker after testing (in Chromium), delete
kiwixjs-cache
(right-click and delete) and unregister the Service Worker (under Application -> Service Workers ... ).If there are code updates, a similar procedure would need to be followed, then a full page refresh (e.g. Ctrl-Shift-R), before testing again.