Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVG <use> links are ignored when using a custom scheme #1650

Closed
brakhane opened this issue May 25, 2022 · 3 comments
Closed

SVG <use> links are ignored when using a custom scheme #1650

brakhane opened this issue May 25, 2022 · 3 comments
Labels
bug Existing features not working as expected
Milestone

Comments

@brakhane
Copy link

brakhane commented May 25, 2022

By default, Matplotlib renders fonts as paths and adds references to a particular letter path using eg. <use xlink:href="#DejaVuSans-66"/>.

Those use links will be silently ignored when the svg was loaded by a custom fetcher with a scheme different from file:

from urllib.parse import urlparse

from weasyprint import HTML, default_url_fetcher

def my_fetcher(u):
    url = urlparse(u)
    if url.scheme == "bug" or (url.scheme == "file" and url.netloc == "bug"):
        return dict(
            file_obj=open("bug.svg", "rb"),
            mime_type="image/svg+xml",
        )
    else:
        return default_url_fetcher(u)

HTML(
    string="""
        <h1>Broken</h1>
        <img src="bug://">
        <h1>Working</h1>
        <img src="file://bug">
    """,
    url_fetcher=my_fetcher
).write_pdf("bug.pdf")

This results in this buggy pdf, where the text is missing in the first image.

Any matplotlib file will do, but I used this one (i had to change the extension to txt so Github wouldn't try to be helpful and upload it to user-images instead)

The Cairo based Weasyprint 52 doesn't have this problem.

I analysed the problem, and that's what I could figure out:

svg.defs.use calls node.get_href, which calls get_url_attribute, which calls url_join. This in turn calls stdlib's urljoin, and this seems to distinguish between schemes that "use relative", and those that don't. Obviously, "bug" is not in that list, and therefore just seems to return the url unchanged.

So, back in svg.defs.use, the node.get_href will return #DejaVuSans-66 instead of bug://#DejaVuSans-66, which is what the next if statement (line 29 in Weasyprint 54.3) expects. Since the first part doesn't match, it falls back to trying to fetch the URL, which, when it fails, just silently returns (NB: It would be nice if it could at least print a warning that a resources linked via <use> couldn't be found, that would have helped me save a lot of time tracking this bug down)

When the URLs are of the form file://netloc/xyz, it works, this is the workaround I currently use. But since it's a regression from how Weasyprint 52 behaved, I think it should either be fixed, or at least be documented as a gotcha when implementing custom url fetchers.

@liZe
Copy link
Member

liZe commented May 25, 2022

Hi!

Thanks for the bug report.

I see that the PDF is generated with version 54.3. Could you please try version 55.0? This bug may already be fixed by 50ad4d1.

@brakhane
Copy link
Author

@liZe I just tested it with 55.0, unfortunately, the bug is still there.

@liZe liZe closed this as completed in 26d5d82 May 27, 2022
@liZe liZe added the bug Existing features not working as expected label May 27, 2022
@liZe liZe added this to the 56.0 milestone May 27, 2022
@liZe
Copy link
Member

liZe commented May 27, 2022

The bug is now fixed. When a link only contains a fragment, we now always assume that it refers to the current document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Existing features not working as expected
Projects
None yet
Development

No branches or pull requests

2 participants