-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop putting article URL if no title is provided #208
Comments
Just say that I believe you have to be careful not to break binary search or the display of search results with any change to this. There was probably a reason why libzim added the url in place of title when titles were missing. I think it has to do with the way the binary search is implemented. This is something @mossroy and I discovered when we implemented kiwix/kiwix-js#443 (comment), and there is some discussion there of why we have to search on the url when the title is missing. We were getting crazy binary searches because of missing titles in TEDx ZIMs. For the display of search results (but obviously not for sorting), we decided to add square brackets around the "faked" title when it was derived from the url. |
@Jaifroid this line of code has been introduced at the really beginning of the libzim project, 10 years ago. At that time, there was no concept of title (and title index). This has been introduced at my request. I suspect this piece of code has been made that way to keep compatibility if no title index. In fact I think the main question is even more if we need this title index at all now that we can make suggestion based on the ft index... and the ft index is meanwhile always here. To conclude I don't see any case where the removal of this "hack" should create a problem now (for 10 years it was different). |
@kelson42 OK! (FYI, Kiwix JS still uses a title index, and has no way of accessing any other index. However, Kiwix JS doesn't use libzim, so changing things here won't have any effect, I believe.) |
@Jaifroid It is fundamental that Kiwix JS can access both URl and Title index. This should be a priority to fix. I hardly see how it could work without accessing the URL index. |
@kelson42 All I meant was that Kiwix JS cannot currently access the word index in pre-indexed ZIM files. I'm sure we access the directory pointer list ordered by url, or else we wouldn't be able to find directory entries for assets, which we clearly do. Hopefully this is a non-issue! |
The full-text index is not implemented in Kiwix-JS. But it does use both URL and Title indexes. I also think, like @Jaifroid, that this change might break compatibility with some older ZIM files, for Kiwix clients that use the libzim. For example, I would advise that you test an article search in tedxgeneva-2014_fr_all_2015-03.zim (which probably does not have a built-in full-text index). |
This is use to displayed "something" in kiwix-desktop if there is no title. The fact that we return the url if the title is empty is not a bug. It is part of the spec (See line If you umderstand "If an article has no title return the url" this can be a bug. |
@mgautierfr Thank you for enlightening the discussion. To me we should not have this kind of logic where respective duties of URL and titles are mixed. This is a fact that this is hard to understand and therefore confusing. This is a legacy thing (at the time we have introduced the title & title index on my demand), we do (should) set title properly if not empty (and whatever the value of the URL). As a consequence the best thing is to remove this fallback scenario. |
This is not a fallback scenario. This is THE scenario. If we remove the fact that "empty title -> use url", we will end with all zims with article not properly sorted by title. And then the binary search will fail.
|
@mgautierfr JFYI Kiwix JS does follow the spec: we use exactly the same algorithm (implemented in JS) as libzim for title and url indices, and binary search. |
@mgautierfr I understand now that the title index has not been implemented as I think it was (and as I requested 10 years ago). This is unfortunate as it is unnecessary complicated and confusing. To me, at the best, this is an optimization (which brings almost nothing). Lets keep it like it is then. Thank you @Jaifroid @mossroy @mgautierfr for patience to sort this request out. |
Following this comment kiwix/kiwix-js#475 (comment), I think we should stop faking the article.title properly if there is not title at all. The problematic code is here:
libzim/src/_dirent.h
Line 84 in 684c355
The text was updated successfully, but these errors were encountered: