-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading ZIM archives with no namespace #597
Comments
I am keen on working on this issue, should I work straight away? |
@Rajat379 This is a rather specialized ssue: unless you are very experienced with JS development and with the format of ZIM archives (and the changes happening to that format) I suggest you should choose an easier issue to tackle as a first contribution. |
|
@kelson42 What is the new ZIM format specification we should work with here? I've checked https://wiki.openzim.org/wiki/ZIM_file_format#Namespaces and it doesn't have any updated information. Looking at the Ray Charles sample you provided, it seems the landing page (at least) is in Namespace |
@Jaifroid @mgautierfr Could you please talk together on this? |
@Jaifroid The current roadmad looks like this:
Hope in the meantime we will be able to fix all related bugs in Kiwix JS and other ports. |
Thanks, that's good to know. I don't anticipate big difficulties adapting the code. The main issue, from previous discussions, is likely to be slower binary search if we have to search through one huge namespace instead of one focused on articles. But we now have a block cache merged in 3.1 which should help with that. |
@Jaifroid, I have update the spec on the wiki (https://wiki.openzim.org/wiki/ZIM_file_format) libzim itself (in master) doesn't fully follow the spec yet. It doesn't handle the well know entries (W namespace) and listing (in X namespace). So zim files in http://tmp.kiwix.org/nons_zims doesn't contains those entries. We will not do a release of libzim before it implement correctly the spec. The main changes to remember are :
If you have any questions, feel free to ask them here or directly to me on slack. |
Thank you, @mgautierfr , I'll study that and get back to you if I have any questions. |
@mgautierfr I do have an immediate observation: deprecation of |
@Jaifroid We are aware about the problem you raised of. The titlePtrList will stay available as long as necessary for backward compatibility purpose. That said, its deprecation does not mean we will expect all readers to have to deal with Xapian index to provide a simple title suggestion system. We are already working on its replacement with a better approach and it should be easy for Kiwix JS to add its support, see openzim/libzim#397. |
As @kelson42 said, we will not drop The The content of
|
@mgautierfr Thank you for the reassuring explanation. I'm no expert in the correct terminology, but what you describe sounds like a "deprecated" API, that is still available/supported but that may be removed at some future time. If I've understood correctly, then that may be a better term to use in the spec than "obsolete". |
It is a matter of terminology, it is not a exact science :) I didn't want to use deprecated. Deprecated sound like you should not use (read) it and use other way/method/field. It sound also that the feature may disappear in the future (The last sentence of the paragraph in the spec said that but it's wrong. I already remove it). But it is not the case for
|
As a reminder, we are just a few PRs away to release libzim 7.0.0. |
OK, thanks for the reminder @kelson42. Can I just check whether the ZIM archives in this directory: http://tmp.kiwix.org/nons_zims/ still correspond to the new spec (at least enough to use as test ZIMs)? An immediate change we could do would be to change all references to the /A/ namesepace to /C/ (in our code). We then have to check for any code that has /I/ or /J/ namespaces hard-coded. One complication is that we need to keep all the old code for backward compatibility. |
@Jaifroid These ZIM file still have the title index at the old/current location. @mgautierfr Would you be able please to refresh them? |
@kelson42 But, re-reading the discussion above, it seems that To summarize, I'm proposing we do two separate PRs:
Does this sound like a good way to proceed? |
I've added a hacky PR #698 which can read But I have a query for @mgautierfr: the landing article of this ZIM is in namespace EDIT: I think the answer to my query is above, i.e. that the test ZIMs do not yet handle the |
Sorry to bombard you with questions, @mgautierfr , but do you have any suggestion from your experience on how to deal with the title search issue shown in the screenshot below? This is from the Ray Charles test ZIM, which is small. Extrapolated to a large ZIM, this would be unmanageable. To elaborate: because namespace |
Yes, they are "old" zim file before having a spec for the
Use |
But is that a Xappian compressed index, or is it a standard |
It is the same format than |
@mgautierfr Things are a bit complex already, could you please refresh quickly the ZIM file http://tmp.kiwix.org/nons_zims? |
Done |
Thank you @mgautierfr! I'm running into some inconsistencies in For an example of the latter, see the page with ZIM URL
The assets listed here with namespace Assets on other pages for example ZIM URL
Relative to its page, this yields the correct asset URL:
Is this a local problem with the ZIM writer for this ZIM type? |
This is a issue with zim-recreate (https://github.com/openzim/zim-tools/blob/master/src/zimrecreate.cpp#L75-L89) |
This issue is partially implemented by 18c51f1. What remains is to support the version 1 of |
See comment here: #230 (comment) (and that issue more broadly) and the issue for this on Libzim: openzim/libzim#15. If we haven't achieved #514 by the time this becomes a reality, we will need to do a fair amount of adjustment to the back end. It should not be difficult, just a bit tedious... :-)
The text was updated successfully, but these errors were encountered: