Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfm parsing oddity with links in link text #156

Open
TripleCamera opened this issue Aug 21, 2024 · 2 comments
Open

gfm parsing oddity with links in link text #156

TripleCamera opened this issue Aug 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@TripleCamera
Copy link

Explain the problem.

In gfm, links in link text should not be parsed.

Input:

[https://bilibili.com/](https://bilibili.com/)

Actual output:

<p>[<a
href="https://bilibili.com/](https://bilibili.com/)">https://bilibili.com/](https://bilibili.com/)</a></p>

Expected output:

<p><a href="https://bilibili.com/">https://bilibili.com/</a></p>

Try pandoc!

The bug is in the autolink_bare_uris extension:

C:\Users\EricQiu>pandoc -f gfm-autolink_bare_uris
[https://bilibili.com/](https://bilibili.com/)
^Z
<p><a href="https://bilibili.com/">https://bilibili.com/</a></p>

C:\Users\EricQiu>pandoc -f gfm
[https://bilibili.com/](https://bilibili.com/)
^Z
<p>[<a
href="https://bilibili.com/](https://bilibili.com/)">https://bilibili.com/](https://bilibili.com/)</a></p>

Pandoc version?

pandoc 3.3 (the latest version)

@TripleCamera TripleCamera added the bug Something isn't working label Aug 21, 2024
@jgm jgm transferred this issue from jgm/pandoc Aug 22, 2024
@jgm
Copy link
Owner

jgm commented Aug 22, 2024

Test of gfm behavior:

[[hello](url)](there)

[hello](there)

Here the inner link takes precedence. So, with autolink_bare_uris, it's just the same. The inner link (now an automatically created one) takes precedence. It's a bit hard to see how to avoid this with our current modular structure, where the core (which handles regular links) doesn't know about the autolink_bare_uris extension. I suppose we could do something ad hoc, which is probably what GFM does...

jgm added a commit that referenced this issue Oct 25, 2024
This at least improves on #156.

We still get a link within a link, which isn't right, but at
least the link goes to the right place.

Cf. jgm/pandoc#10333.
@jgm
Copy link
Owner

jgm commented Oct 25, 2024

There are two separate issues here:

  1. Ideally, the URL wouldn't be autolinked when it occurs inside the link label, since this leads to bad HTML (nested <a> inside <a>).
  2. The entire rest of the link is gobbled up as part of the URL, with the result that the outer link isn't recognized as a link at all.

With 7950d58 I have fixed (2). (1) remains, but it isn't as serious as problem; I believe browsers will still display this as a single link.

It is difficult to see how to fix (1), given the architecture of this project. We would need to disable the autolink parser when parsing a link label. But we don't actually know if it's a link label (as opposed to, say, an image label or a span) until after it has been parsed (when we get to the (...) part). We could, perhaps, retokenize and reparse the label in this case, with something in state that the autolink parser can look at to tell itself to do nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants