Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webp image polyfill #650

Closed
Jaifroid opened this issue Sep 8, 2020 · 48 comments · Fixed by #670
Closed

Add webp image polyfill #650

Jaifroid opened this issue Sep 8, 2020 · 48 comments · Fixed by #670
Assignees
Milestone

Comments

@Jaifroid
Copy link
Member

Jaifroid commented Sep 8, 2020

See openzim/mwoffliner#1254 . This is medium to longer term. Currently planned for Android custom apps to reduce distribution size, but it would be useful to be able to support the same ZIMs. It's effectively another compression format, and the polyfill provides an asm/wasm layer that converts webp to png if the browser does not support webp. The polyfill is claimed to work on IE11.

Polyfill is here: https://github.com/chase-moskal/webp-hero

@Jaifroid Jaifroid added this to the v2.11 milestone Sep 8, 2020
@Jaifroid Jaifroid self-assigned this Sep 8, 2020
@kelson42
Copy link
Collaborator

We need to be carefuly that this does not conflicts with the polyfill available in the ZIM file itself (which is the plan so far).

@Jaifroid
Copy link
Member Author

OK, noted. In our jQuery mode (which we should probably rename "DOM traversal mode"), we would need to attach the polyfill manually, but it can be extracted from the ZIM. In SW mode, it will attach automatically. So this PR then would only need to cover the former mode (and there would be no conflict).

@kelson42
Copy link
Collaborator

The PR to add webp compression to MWoffliner makes good progress openzim/mwoffliner#1265 and I expect sometime next week to be ready.

New version of Youtube based ZIM files have images using Webp.

Both (will) have a polyfill.

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 8, 2020

@mossroy See discussion in openzim/mwoffliner#1265 . As this is another potential moment of "deprecation" for old browsers running Kiwix JS, and as adding a test in jQuery mode for the existence of the webp polyfill in the ZIM, and attaching it if it is found, is not at all hard, I propose to add and test this here soon. The reason is that I don't believe we have any browsers capable of running SW mode that do not already support webp, so it is only needed for browsers that are stuck in jQuery mode.

@mossroy
Copy link
Contributor

mossroy commented Oct 8, 2020

Great, I agree

@kelson42
Copy link
Collaborator

Here is WikiMed ZIM file for custom apps, with webp: http://mirror.download.kiwix.org/zim/.hidden/custom_apps/wikipedia_en_medicine-app_maxi_2020-10.zim.

@Jaifroid
Copy link
Member Author

Wow, that file saves 800MB over the standard one, a substantial size reduction.

@kelson42
Copy link
Collaborator

@Jaifroid Indeed, but this file does not have the fulltext search index either.

@Jaifroid
Copy link
Member Author

@kelson42 Are you sure the polyfills are included in the ZIM? I can see they are referenced in each article's HTML, but our Service Worker cannot find them at the referenced ZIM URL:

image

@kelson42
Copy link
Collaborator

kelson42 commented Oct 10, 2020

@Jaifroid Yes, there is a bug which makes the ZIM created within Docker to have this problem. Fixed in openzim/mwoffliner#1274. I have also a super small PR to introduce lazy loading for images at openzim/mwoffliner#1275. As soon as merged, I will redo the WikiMed ZIM file and let you know.

@kelson42
Copy link
Collaborator

@Jaifroid I’m not at the office, but I believe this new version of the ZIM to fix the problem http://mirror.download.kiwix.org/zim/.hidden/custom_apps/wikipedia_en_medicine-app_maxi_2020-10.zim

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 11, 2020

@kelson42 That ZIM has the polyfills, thank you!

I have tested the generic solution -- JavaScript Support branch for jQuery mode -- and it has the maybe obvious issue that because images are loaded through a custom process in jQuery mode, the webp polyfill doesn't see them at load time, so does not attempt to convert them.

So, we would either need to add a custom onload event to each image to run webpMachine.polyfillImage(imageElement), or (and I think this would be more performant) we should directly test for and (if webp) convert the incoming Uint8Array using
webpMachine.decode(webpData). I'll test the latter approach unless anyone has an idea for a more generic approach.

@mossroy
Copy link
Contributor

mossroy commented Oct 11, 2020

The polyfill should decode images only when the browser does not support webp.

Maybe a simpler and more generic approach would be to run webpMachine.polyfillDocument() after all the images are loaded through javascript. I suppose it would detect if the browser supports webp, and convert all images (only if necessary).
I agree that it might be slower, but maybe it does not make a big difference

@Jaifroid
Copy link
Member Author

Indeed we should only convert images if the browser doesn't decode webp natively (the polyfill has a test). According to the documentation, polyfilling the document only works on elements that have the .webp extension, whereas our images would be blob URIs, so that approach wouldn't work "out of the box":

currently only detects webp images with filename ending in .webp extension

@mossroy
Copy link
Contributor

mossroy commented Oct 11, 2020

You're right.
And I unfortunately don't see a way to add ".webp" to a blob URI when we generate it...

If we use the polyfill library inside the ZIM file, there is a risk to make use of internal functions of the polyfill, that might change in future versions of the polyfill. We should make as few assumptions as we can on the content of ZIM files.

Maybe another approach would be to override defaultDetectWebpImage, so that it uses our data-kiwixurl attribute instead of src attribute to test the ".webp" suffix. Based on chase-moskal/webp-hero#25 (comment), it should be technically possible to do that

@Jaifroid
Copy link
Member Author

@kelson42 I've been looking at the issues for that polyfill, and the dev has posted this:

  • conditional loading of the whole library — the currently recommended strategy in the readme is alarmingly wasteful, because it will load the entire 1MB bundle, even on new browsers which already support webp. i think we should create some kind of wrapper which avoids loading the whole library when webp is supported

I know you only add the polyfills on articles which have images, but it looks like even that approach is quite wasteful on the majority of browsers. Effectively, these polyfills add a 1MB load just to support IE11 and Safari. Well, I realize iOS support is crucial, but as the Dev of the polyfill says, it is alarmingly wasteful on other browsers engines. I wonder if a better approach would be conditional loading of these polyfills? The detection part of the script is really very short in TypeScript (a bit longer when converted to universal JS). Maybe in fact, the best approach would be to make webp support part of the Kiwix API, so that readers load it when they know they need it (like ToC support)?

@Jaifroid
Copy link
Member Author

Maybe another approach would be to override defaultDetectWebpImage, so that it uses our data-kiwixurl attribute instead of src attribute to test the ".webp" suffix. Based on chase-moskal/webp-hero#25 (comment), it should be technically possible to do that

Interesting, yes. Do you mean we should write a custom function override for parts of the polyfill, sort of polyfill the polyfill?

@mossroy
Copy link
Contributor

mossroy commented Oct 11, 2020

No, there seems to be a detectWebpImage option that can be passed when creating the WebpMachine instance : https://github.com/chase-moskal/webp-hero/blob/master/source/webp-machine.ts
If we can do that (and provided that the version of this library is >=v0.0.0-dev.23), we could let the library do all the job (and not make more assumptions on how it works internally)

@Jaifroid
Copy link
Member Author

OK, that looks like a possibility. I guess we'll need to avoid killing the image blobs on first load as we currently do, and keep them residing in memory so that the polyfill can retrieve their data. Depending on how fast the polyfill works, the user is quite likely to see broken image icons till the data have been converted and the image reloaded.

We could compare both approaches (run polyfill after all images are loaded vs convert each webm Uint8Array as it is retrieved if the browser doesn't support webm natively).

@kelson42
Copy link
Collaborator

kelson42 commented Oct 11, 2020

The polyfill should decode images only when the browser does not support webp.

Sorry for my dummy question, but does this kind of code (like in the youtube2zim scraper) would be better? https://github.com/openzim/youtube/blob/master/youtube2zim/templates/assets/webp-trigger.js. Maybe there is even a simplier approach like new webpHero.WebpMachine({detectWebpImage: true}).polyfillDocument()

Maybe in fact, the best approach would be to make webp support part of the Kiwix API, so that readers load it when they know they need it (like ToC support)?

I'm not against considering the idea, but this is in any cased blocked by this pending PR openzim/mwoffliner#1237

@Jaifroid
Copy link
Member Author

@kelson42 It looks very much like that code still loads the bundle so that it can use its built-in test. If we want to avoid loading the bundle at all if the browser supports webp, then we need to have a simple test script that then loads the other two scripts only if needed.

A very short ES5 script to do this is suggested here, and copied below:

function testWebP(callback) {
    var webP = new Image();
    webP.onload = webP.onerror = function () {
        callback(webP.height == 2);
    };
    webP.src = '';
};

And this is how you'd use it. -- if it's of interest, I could complete this to attach the required scripts.

testWebP(function(support) {
    if (!support) {
         // Attach WEBPMACHINE scripts to DOM
    }
}); 

@kelson42
Copy link
Collaborator

kelson42 commented Oct 11, 2020

@Jaifroid ok, will try yo make a PR like this to mwoffliner. Or just make one, if you have one.

@Jaifroid
Copy link
Member Author

Experimental branch pushed (commit 8232168) which works in Internet Explorer. Strictly for testing, as this branch is built on my JavaScript support brach which generically loads all JS that it can. It's just a proof-of-concept. This version does not test if the browser supports webp, just converts all image Uint8Array data to png dataURIs (that's what the built-in conversion function does).

Prodcution code would run faster, because we wouldn't be attaching all JS in an article, only the webp JS. However, it's not too bad on IE11, surprisingly (especially after the memory cache is populated).

I'll also try the other approach, but probably not today, as running out of time.

@Jaifroid
Copy link
Member Author

@Jaifroid ok, will try yo make a PR like this to mwoffliner. Or just make one, if you have one.

@kelson42 I could have a go! Are you very pressured to get this out?

@Jaifroid
Copy link
Member Author

@kelson42 Sorry, were you saying you would do it? (I can't test mwoffliner on my machine).

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 11, 2020

OK @kelson42 an implementation all inside one external custom script could look something like this (caution: untested!!):

// Location of external scripts to load conditionally
var webHeroScripts = ['../-/j/js_modules/webpHeroPolyfill.js', '../-/j/js_modules/webpHeroBundle.js'];
        
function testWebP(callback) {
    var webP = new Image();
    webP.onload = webP.onerror = function () {
        callback(webP.height == 2);
    };
    webP.src = '';
};

testWebP(function(support) {
    if (!support) {
        webHeroScripts.forEach(function (scriptUrl) {
            var script = document.createElement('script');
            script.type = 'text/javascript';
            script.src = scriptUrl;
            document.getElementsByTagName('body')[0].appendChild(script);
        }); 
    }
}); 

@Jaifroid
Copy link
Member Author

PS That attaches the polyfills to the body of the doc, not the head -- just in case there is a document without a head (unlikely?).
Also, forgot to have a couple of lines to add dynamically this:

<script>
	var webpMachine = new webpHero.WebpMachine()
	webpMachine.polyfillDocument()
</script>

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 11, 2020

Maybe this complete version would work (needs careful testing!!!). It creates and attaches an inline script, which could be a problem for some strict frameworks. If so, the inline script could be replaced with a third external script, whcih should be the last one. That script should attach webpMachine to window , so that the machine can be used outside the script.

// Location of external scripts to load conditionally
var webHeroScripts = ['../-/j/js_modules/webpHeroPolyfill.js', '../-/j/js_modules/webpHeroBundle.js'];
        
var testWebP = function(callback) {
    var webP = new Image();
    webP.onload = webP.onerror = function () {
        callback(webP.height == 2);
    };
    webP.src = '';
};

var startWebpMachine = function() {
    var newScript = document.createElement('script');
    var inlineScript = document.createTextNode('var webpMachine = new webpHero.WebpMachine(); webpMachine.polyfillDocument()');
    newScript.appendChild(inlineScript); 
    document.getElementsByTagName('body')[0].appendChild(newScript);
};

testWebP(function(support) {
    if (!support) {
        webHeroScripts.forEach(function(scriptUrl) {
            var script = document.createElement('script');
            script.type = 'text/javascript';
            script.src = scriptUrl;
            if (webHeroScripts[webHeroScripts.length-1] === scriptUrl) {
                // Start webpMachine if we have loaded the last script
                script.onload = startWebpMachine;
            }
            document.getElementsByTagName('body')[0].appendChild(script);
        });
    }
});

@Jaifroid
Copy link
Member Author

EDIT: added missing end closures!

@kelson42
Copy link
Collaborator

kelson42 commented Nov 7, 2020

@Jaifroid Thank you, the work has been done on MWoffliner and is part od latest release 1.11. For the moment only a feww ZIMS, like wikimed for app, have the option, but I plan to move all of them in the next months.

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 7, 2020

OK, thanks @kelson42 , I see there is a test ZIM here: https://github.com/openzim/mwoffliner/files/5451044/zimfile_with_webp.zip which we can use to test/enable support in KJS and KJSW.

@kelson42
Copy link
Collaborator

I would like to introduce webp optimisation for most (if not all) of our MWoffliner generated ZIM files around end of the month (November). But I would like to secure the our Kiwix JS (based) reader support them properly. Does that sounds a reasonable timeline? If not, what would be a better one?

@Jaifroid
Copy link
Member Author

@kelson42 I'll tell you after I've tried to incorporate the polyfill into the WikiMed release I'm doing now. If that goes smoothly, then the timeline you're thinking of should be fine. More soon.

@Jaifroid
Copy link
Member Author

Hmm. I can get the polyfill to run in Internet Explorer, but so far not in Edge Legacy, surprisingly (as the latter is a more modern browser).

@Jaifroid
Copy link
Member Author

Now working in Edge Legacy / Windows Mobile. 😊

@Jaifroid
Copy link
Member Author

Jaifroid commented Nov 13, 2020

@kelson42 I've tested the polyfill on a Windows Mobile physical device and on various VMs. It works fine on any device with 1GB or greater RAM. For the lowest-powered device, with 500MB RAM, the app launches and then crashes as soon as it tries to load this polyfill. While I don't believe there are many/any people with such a low-powered device attempting to run Kiwix, it is indicative that this polyfill is pretty heavy and wasteful of memory. Nevertheless, on the lowest-spec 1GB device, it runs smoothly. So I don't think this should be a blocker. The 500MB device struggled with XZ-encoded ZIMs in any case, so this would not be a major deprecation, and I do recommend 1GB minimum to run KJSW.

@mossroy In jQuery mode, the most efficient approach is to decode the images inside the feedNodeWithBlob function in uiUtil.js. In fact what is emitted is not a BLOB but a dataURI. If we attempt to wait till all images are loaded and then run the polyfillDocument() method, it would just replicate what we already do: i.e. it runs a querySelectorAll on the document looking for image nodes, then runs a FileReader on each image to extract the data, then converts it and emits a dataURI. This would be very inefficient, doing the job twice that we already do. Moreover, it would mean that we wouldn't be able to run revokeObjectURL on BLOB URLs once loaded, as we would need the image to be readable by the polyfill. It would also show broken images until the images are converted.

For SW mode, the included polyfill should just run. However that is currently untested, as I don't have any browser that can run Service Worker but cannot decode WebP natively. If there are none, then maybe we don't need to worry about SW mode, other than to ensure that we don't load the polyfill into memory if the user launches the app in SW mode.

@kelson42
Copy link
Collaborator

@Jaifroid Good, so I need we would need new releases to get this ready for users?

@Jaifroid
Copy link
Member Author

@kelson42 Yes. I'll do a WikiMed release on KJSW as a pilot in the next few days. That can be followed with a PR here if my suspicions are correct that we do not need to do anything for the Service Worker mode of this app. If we do need to support SW mode explicitly, then the PR could be more complicated...

@mossroy
Copy link
Contributor

mossroy commented Nov 13, 2020

My concern is, as usually, to do as few assumptions as we can on the ZIM content. Else we take a risk that it will not work on future versions of this ZIM file, or on other unknown ZIM files. The more we use high-level APIs of the polyfill, the safer we are.
I agree with you that it would be slower to run polyfillDocument() , at least because it would parse the DOM twice. I don't know how much slower it would be.

BUT we could have another approach : in jQuery mode, instead of using the polyfill found inside the ZIM file, we could provide our own polyfill with kiwix-js. This way, we don't make any assumptions on the ZIM content (it could even not contain the polyfill). And we can optimize the algorithm as we wish (because we know for sure that the version of our polyfill will not change depending on the ZIM file), including by using lower-level APIs of the polyfill.

Regarding ServiceWorker mode, I agree that we certainly have nothing to do. The polyfill (included in the ZIM file) should work, as any other javascript.
All browser based on Chromium that support ServiceWorkers also support WebP. But some versions of other browsers support ServiceWorkers but not webp : Edge 17, Firefox 44-64, Opera 19-31, and some Safari versions (compare https://caniuse.com/serviceworkers and https://caniuse.com/webp)
So I think it's necessary to test on one of them (in Service Worker mode). You can for example use a portable version of an old Firefox : https://sourceforge.net/projects/portableapps/files/Mozilla%20Firefox%2C%20Portable%20Ed./ (don't use ESR versions : ServiceWorkers are disabled on old ESR versions)

@Jaifroid
Copy link
Member Author

@mossroy This is how I've added (so far partial) WebP support in KJSW, i.e. I've included my own copy of the polyfill rather than extracting it from the ZIM, for speed and compatibility, as you say. I say the support is partial because it is currently implemented for Windows Mobile, i.e. I did not include the polyfills needed by IE11 (thought I have tested them and they work).

I agree we'll need to test for those browsers with SW support and lacking WebP, or at least one of them. Thanks for researching that.

@mossroy
Copy link
Contributor

mossroy commented Nov 14, 2020

Cool!
Sorry, I had probably misunderstood you, and did not have a look at your code

@Jaifroid
Copy link
Member Author

I spent some time this morning trying to enable conditional loading of the WebP Polyfills. I just wanted to document my findings here:

  • There is a clash between RequireJS define and the use of define in the main Polyfill (possibly also in the core-js Promise and Array.from polyfills bundled with it for IE11).
  • The clash can be worked around by loading the polyfills before loading the main require.js script, but this means we must load the polyfills in index.html.
  • Loading in index.html means that there is a limited amount of testing that can be done without major cluttering, and for my use case in KJSW, I am simply loading the WebP polyfill unconditionally, and only adding basic conditional logic to load the Promise and Array.from polyfills (the latter would only get loaded if someone ran my code against IE11).
  • It may be that properly attaching the scripts via the require JS config may help, if the scripts contain logic to detect define, but I haven't had time to test that yet.
  • All of this underlines the importance of moving to a more modern dependency loading system like webpack (Introduce npm+webpack to build the app #554) or whatever is current today.

@Jaifroid Jaifroid linked a pull request Nov 17, 2020 that will close this issue
Jaifroid added a commit that referenced this issue Nov 22, 2020
@Jaifroid
Copy link
Member Author

@kelson42 Just to say that WebP is now enabled in both Kiwix JS and Kiwix JS Windows. In Kiwix JS it is in master, but not yet released, and there are small issues with the polyfill that need further investigation in #673, but I don't think the issue is a blocker for a release (it affects only a very small number of old browsers that mostly will have autoupdated to the latest, and even those browsers can fully access the Wikimedia ZIMs in jQuery mode, which is still the default).

Let us know when / if you start to make ZSTD+WebP formats for all Wikimedia ZIMs (other than in the .hidden directory), so that we can prepare a release here to coincide with the start of that.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 1, 2020

So, I guess we can move on with ZIM generation with webp images? I will ask to move all our Mediawiki based recipes.

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 1, 2020

Yes, we're all ready. We can do a new release as soon as the new ZIMs start to appear.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 1, 2020

@Jaifroid From now, all new ZIM files using mwoffliner will be done with webp pictures

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 1, 2020

OK, thanks, we'll schedule a new release in the next couple of weeks.

@Jaifroid Jaifroid modified the milestones: v3.4, v3.1 Dec 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants