Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIM backend : Try to cross-compile the libZIM library in webassembly with emscripten #116

Closed
mossroy opened this issue Jun 7, 2015 · 49 comments
Assignees
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Jun 7, 2015

It would avoid to re-code it in javascript, and would also ease support of future evolutions of the file format.
But not sure how Emscripten would handle the file I/Os

@mossroy mossroy added this to the v2.0 milestone Jun 7, 2015
@mossroy mossroy changed the title Try to cross-compile the libZIM library in javascript with emscripten ZIM backend : Try to cross-compile the libZIM library in javascript with emscripten Jun 7, 2015
@peter-x
Copy link
Contributor

peter-x commented Jun 7, 2015

Being able to use libZIM as is depends on the question whether we can use synchronous IO at the low-level javascript side.
This probably has to wait until we know how we can access the filesystem from ServiceWorkers.

@mossroy
Copy link
Contributor Author

mossroy commented Jun 7, 2015

I don't think there is a way to do synchronous I/O on files in javascript.
Regarding ServiceWorkers, I suppose we have access to the same APIs as in any javascript.

@mossroy mossroy modified the milestones: v2.1, v2.0 Aug 29, 2015
@mossroy mossroy assigned dattaz and unassigned peter-x Apr 8, 2017
@mossroy
Copy link
Contributor Author

mossroy commented Apr 8, 2017

Dattaz, I assign you this issue, as you expressed some interest in it.
No hurry and no obligation of course

@mossroy
Copy link
Contributor Author

mossroy commented Apr 8, 2017

@thiolliere if you want to have a look, too

@mossroy
Copy link
Contributor Author

mossroy commented Apr 22, 2017

I met @bnjbvr yesterday (he was giving a conference about WebAssembly, at @mixitconf), and talked to him about this idea. He said he might put us in touch with the maintainer of emscripten if necessary.
Obviously, it's too early for now.

I found this article that tackles our issue : https://hacks.mozilla.org/2015/02/synchronous-execution-and-filesystem-access-in-emscripten/
There seems to be a notion of "virtual filesystem" that allows synchronous access to files that are preloaded in memory. It's clearly not possible for us because of the size of our ZIM files.
Refactoring the zimlib code to make asynchronous file I/O is probably complicated. It's a change that would only be interesting for us (not for the other applications using this library) and would probably make the code less readable.
Maybe the Emterpreter might be an option, but they say it would be slower. So maybe we would have to find a way to use the Emterpreter only on the code that reads the ZIM file, and compile everything else in WebAssembly, if it's technically possible.

@dattaz
Copy link

dattaz commented May 10, 2017

I have compile libzim in webassembly ; you can check demo here : https://dattaz.github.io/libzim_wasm/ (but zimfile (meta.esperanto.stackexchange.com_eng_all_2017-05.zim 1,9M) is embedded ) ; Now we have to deal with filesystem :)

Source code to build is here : https://github.com/dattaz/libzim_wasm

@mossroy
Copy link
Contributor Author

mossroy commented May 10, 2017

Yeah! @dattaz, you rock!
It works pretty well, that's promising.
As you said, the next challenge is to deal with file I/Os. It might be the most difficult part, and might even not be technically possible.

The virtual filesystem you used for the demo can not work with bigger ZIM files that would not fit into memory.
We're left with the options in previous comment :

  • refactor libzim to make asynchronous file I/Os : it's the recommended way, and the best option for performance, but I don't know if it's not too complicated to do that
  • or use Emterpreter to keep synchronous file I/Os : it would be much slower, according to Mozilla

In both cases, we have to make it use a javascript File object that we would pass to the wasm code : I hope it's possible

@dattaz
Copy link

dattaz commented May 26, 2017

Emscripten has WORKERFS as file system which permit to load file object as file into FS. This is only allowed in web worker. According doc (https://kripken.github.io/emscripten-site/docs/api_reference/Filesystem-API.html) it's seem really close of that we want to do : "This file system provides read-only access to File and Blob objects inside a worker without copying the entire data into memory and can potentially be used for huge files."

Here is a (little) demo : https://dattaz.github.io/libzim_wasm/file_api/index.html

Note that for the moment there is a issue with file bigger than 2GB : emscripten-core/emscripten#5250

@mossroy
Copy link
Contributor Author

mossroy commented May 27, 2017

I did not know about workerfs : it looks great for our need!
This is very promising : if we manage to expose all the libzim functions through javascript, it might replace the low-level zim javascript code. It would be a huge improvement for kiwix-html5

This was referenced May 27, 2017
@mossroy mossroy modified the milestones: v2.2, v2.3 Jan 4, 2018
@kelson42 kelson42 changed the title ZIM backend : Try to cross-compile the libZIM library in javascript with emscripten ZIM backend : Try to cross-compile the libZIM library in webassembly with emscripten Jan 10, 2018
@kelson42
Copy link
Collaborator

kelson42 commented Mar 1, 2019

@mossroy That sounds like an important step forward, but I'm not sure to understand concretly what it means. Does it means the work to bind Kiwix JS with libzim-js could start?

BTW, I use the opportunity to inform you formally that @ISNIT0 has updated node-libzim https://github.com/openzim/node-libzim the nodejs binding to the libzim... and that mwoffliner 1.8 - to be release soon, use it.

@ISNIT0
Copy link

ISNIT0 commented Mar 1, 2019

The bindings are written in nbind, which natively supports compiling to webassembly, possible to do :)

@mossroy
Copy link
Contributor Author

mossroy commented Mar 1, 2019

No, what I did yesterday is not a significant step. It just simplifies reproducing what was achieved at the last hackathon on this topic (based on the work of dattaz).
We currently manage to compile libzim with emscripten, but using it in kiwix-js still needs a lot of work.
The current prototype is "only" a promising proof-of-concept.

There is at least one blocking issue : emscripten-core/emscripten#5250 , which prevents reading files bigger than 2GB. This has to be fixed on the emscripten side, and I'm not able to do that myself. This issue has been tagged by kripken as "help wanted" : if you know someone able to work on that, it would be cool.

And there are also many things that need to be done before really using it in kiwix-js (see #116 (comment) above) :

  • compile kiwix-lib with emscripten, and use it instead of libzim, in order to benefit from its higher-level APIs
  • replace the quick-and-dirty patches that I had to put on icu and libzim (see files patch_* in https://github.com/mossroy/libzim_wasm) by more sustainable fixes
  • test how emscripten lets us use C APIs that have more complicated signatures than strings : how can we pass and/or use return values that are arrays or C classes? I'm currently using embind for these bindings, which should let us do that : https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html
  • compare the APIs of our current backend with the ones of kiwix-lib, and see how we can use them. It will certainly not be a perfect match, and we probably need to add C and/or javascript glue
  • fix the fact that we currently can do only one call to the C API, and properly wait for the wasm binary to be initialized before using it
  • fix the fact that the C API sometimes returns a String that also contains the following articles of the ZIM files (instead of containing only the requested article)
  • check that it works with split ZIM files
  • check if asm.js can still be used as a fallback for platforms where webassembly is not available
  • and probably other issues that we will discover

All this is very exciting. I'm just missing some time to work on it.

@mossroy
Copy link
Contributor Author

mossroy commented Mar 1, 2019

Regarding https://github.com/openzim/node-libzim, I currently don't see how it could be useful for kiwix-js.
Maybe I misunderstood but I don't think it can be used in a browser environment, as it downloads the libzim binary, and executes it through some javascript bindings. Executing a binary is not possible in a (standard) browser environment, for security reasons.
To my knowledge, the only cross-platform and cross-browser way to execute a binary is to compile it with emscripten : either in webassembly or in asm.js.

The second step is to bind the recompiled C code with javascript.
nbind looks like a possible alternative to embind. It seems that it has been discussed with the emscripten team : emscripten-core/emscripten#4770. It was rejected but it was 2 years ago...
In any case, nbind is not currently listed in their documentation : https://emscripten.org/docs/porting/connecting_cpp_and_javascript/index.html only mentions Embind and WebIDL
But, if it better suits our needs, why not

@mossroy
Copy link
Contributor Author

mossroy commented Mar 3, 2019

I worked a bit on compiling kiwix-lib with emscripten (in branch https://github.com/mossroy/libzim_wasm/tree/kiwix-lib-compilation) : it's a long journey but I make progress (with many dirty hacks).
It forces to compile every dependency with emscripten too, so it takes time to make them work.

@mossroy
Copy link
Contributor Author

mossroy commented Mar 14, 2019

I managed to compile the kiwix-lib with emscripten, and make a simple C call from javascript.
But I still can't make more than one C call : I probably do not do things right.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 18, 2019

When I make the second C++ call, it fails with Assertion failed: you need to wait for the runtime to be ready (e.g. wait for main() to be called) (even if main() has already been called).
I first thought it was because I was trying to keep the Reader instance in javascript, but it's not : I have the same issue if I re-create it each time (in C++).
This issue started when I used separate function calls (with embind). I tried to switch to WebIDL (in the more-WebIDL-binding-experiments branch) but did not manage to make it work because it tries to instantiate Reader through a constructor with no parameters (which does not exist in kiwix-lib). I doubt it is related to binding, in any case.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 18, 2019

Well, I have to admit I'm currently stuck.
I spent some time trying to make multiple C calls work, but did not find what's wrong.
I'm also not very efficient because I almost don't know how to code in C++ any more.
Some technical help would be appreciated on this issue.

@ISNIT0
Copy link

ISNIT0 commented Apr 18, 2019

Would be happy to organise a call, it might be that what I've learned from node-libzim will be helpful?

@mossroy
Copy link
Contributor Author

mossroy commented Apr 18, 2019

Thanks @ISNIT0 for the pointers you gave me during this call.

I'll try to sum up what we discussed :

  • you faced technical issues when working on https://github.com/openzim/node-libzim (with nbind), and managed to solve them. So you suggest to try to switch to nbind instead of embind, in order to join forces on the same APIs. There are a few differences though :
    • node-libzim uses the libzim, not kiwix-lib (but the principles should be the same)
    • node-libzim uses a binary version of the libzim. We need to use wasm instead (or asm.js). That's probably the main challenge
    • node-libzim uses typescript, but it should work in plain javascript too
  • regarding the multiple c calls, it might come from the fact that I instantiate a class inside the C function. I should try to separate that into 2 functions : a first one that creates the instance (and keeps it in a global C variable), and a second one that uses it. I had tried that earlier today, but was unsuccessful because the C class does not have a default constructor with no parameters. In https://github.com/openzim/node-libzim/blob/master/src/ZimReader.cc, you worked around this issue by using a Wrapper class, that only keeps a reference to the instance we need : I should try the same approach

@mossroy
Copy link
Contributor Author

mossroy commented Apr 18, 2019

I just tested to separate the code in 2 functions (initReader and getArticleCountFromReader) in https://github.com/mossroy/libzim_wasm/tree/multiple-c-calls. With the wrapper workaround, it compiles and runs, but there's the same error in the second call.

@ISNIT0
Copy link

ISNIT0 commented Apr 18, 2019

Unfortunately I can't get the build to run on my Mac, so I can't play around with it :(

@mossroy
Copy link
Contributor Author

mossroy commented Apr 18, 2019

@ISNIT0 : it's possible to build with Docker if it helps (see the README.md)

@mossroy
Copy link
Contributor Author

mossroy commented Jun 1, 2019

This github issue has too much history : it's probably complicated to get into it.
I'll close it and create other smaller issues for what still needs to be done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants