Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clicking on footnote/endnote references reloads the page in latest Wikipedia English novid ZIM #428

Closed
Jaifroid opened this issue Oct 13, 2018 · 6 comments
Assignees
Labels
Milestone

Comments

@Jaifroid
Copy link
Member

Jaifroid commented Oct 13, 2018

I'm not sure if this is a bug with our code, or whether it is a problem with the format of the latest English Wikipedia file wikipedia_en_all_novid_2018-06.zim. In this file (I haven't tested the non-novid version), links to footnote references appear like this:

href="Ottoman_Empire.html#cite_note-:0-13"

In other words, the link contains the url of the page. This causes our anchor parsing routine to miss the fact that this is an anchor reference, and instead the page is re-loaded. The link is not followed, and the footnote is not shown.

The "problem" (if it isn't a bad encode) is in this regex:

var regexpLocalAnchorHref = /^#/;

It clearly does not match the above href. We can work around with relocating this to the anchor parsing routine like this (roughly):

var regexpLocalAnchorHref = new RegExp('^(?:#|' + dirEntry.url + '#)([^#]+$)');
var anchorRef = href.replace(regexpLocalAnchorHref, '$1');
document.getElementById('articleContent').contentWindow.location.hash = anchorRef;

Should we work around? I guess this ZIM file is a popular one that will be around for a long time.

@Jaifroid Jaifroid self-assigned this Oct 13, 2018
@mossroy
Copy link
Contributor

mossroy commented Oct 14, 2018

This probably affects jQuery mode, but not SW mode, I suppose?
In any case, it's probably worth working on it, as it does not seem very complicated to solve.
Be careful on the RegExp that dirEntry.url might contain special characters that would be interpreted by the RegExp (like #?*+ etc)

@mossroy mossroy added this to the v2.5 milestone Oct 14, 2018
@Jaifroid
Copy link
Member Author

I confirm that the issue only affects jQuery mode. In SW mode, the page is not reloaded and the browser jumps to the selected footnote. A regex like the one above should be fine with query strings, percent encodes, and other special characters, but I believe # is a reserved delimiter, so can only appear once (legally) in a URL. In any case this regex will only match either a URL starting with # or a URL starting with the current page's own URL immediately followed by #, so it should be OK.

@Jaifroid Jaifroid added the bug label Oct 14, 2018
Jaifroid added a commit that referenced this issue Oct 14, 2018
Jaifroid added a commit that referenced this issue Oct 14, 2018
@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 15, 2018

Contrary to the above comment, I've discovered that SW mode is affected by this issue, but in a lesser way. To reproduce, load wikipedia_en_all_novid_2018-06.zim in master (SW mode), and search for "Presidential $1 Coin Program" (or any other page with a $ or other percent-encoded character in the URL). Click on a footnote reference, and the page reloads. However, in SW mode, the browser does jump to the footnote after page reload. It causes all images to reload, etc.

@Jaifroid
Copy link
Member Author

Jaifroid commented Oct 15, 2018

One fix that appears to work for SW mode is to set the source of the iframe like this in the readArticle function of app.js:

iframeArticleContent.src = dirEntry.namespace + "/" + encodeURIComponent(dirEntry.url);

It solves the issue with reloading the page, because the browser can now match the URL of the page with the URL of the footnote reference. We need to test that this fix doesn't have other unintended consequences, however. I've added it to the PR here for testing. Thoughts?

PS The reason this works is that the issue only appears to affect pages loaded via the app.js interface, not pages loaded by following a URL inside a page in SW mode. I believe it is because the latter are already percent-encoded, whereas we don't currently percent-encode the URL we send to SW for pages loaded from the UI interface.

@mossroy
Copy link
Contributor

mossroy commented Oct 15, 2018

You're certainly right about the encoding of URLs in SW mode (I did not test). It would be a good thing to fix it in the PR too.

@Jaifroid
Copy link
Member Author

Thanks, I have included the SW fix in the PR, but I'll finalize #427 first, because this branch would need to be rebased (edits to that section of code in SW mode in both PRs would be in conflict).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants