Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange text overflow #1817

Closed
arnKo opened this issue Feb 23, 2023 · 5 comments
Closed

Strange text overflow #1817

arnKo opened this issue Feb 23, 2023 · 5 comments
Labels
bug Existing features not working as expected
Milestone

Comments

@arnKo
Copy link

arnKo commented Feb 23, 2023

We discovered the problem trying to render the following list of file extensions:
.png, .jpg, .jpeg, .jpe, .gif, .zip, .doc, .docx, .docm, .xls, .xlsx, .xlsm, .ppt, .pptx, .pptm, .pps, .ppsx, .odt, .ods, .odp, .odf, .rtf, .pdf, .psd, .csv, .msg, .mp4, .webm, .xlf, .xliff

Weasyprint renders the line without breaking it although it overflows the page. However, if we remove the dots from the extensions, suddenly the text is rendered correctly in the page's boundaries.

I added a minimal example below that generates the following PDF. The first line is not broken at all. In the second line, I removed the dot from the second extension .jpg and the line breaks. Only the third row without any dot renders correctly.

image

from weasyprint import HTML

text = """
<html>
    <head>
        <style>
            p {
                border: 1px solid red;
            }
        </style>
    </head>
    <body>
        <p>
            .png, .jpg, .jpeg, .jpe, .gif, .zip, .doc, .docx, .docm, .xls, .xlsx, .xlsm, .ppt, .pptx, .pptm, .pps, .ppsx, .odt, .ods, .odp, .odf, .rtf, .pdf, .psd, .csv, .msg, .mp4, .webm, .xlf, .xliff
        </p>
        <p>
            .png, jpg, .jpeg, .jpe, .gif, .zip, .doc, .docx, .docm, .xls, .xlsx, .xlsm, .ppt, .pptx, .pptm, .pps, .ppsx, .odt, .ods, .odp, .odf, .rtf, .pdf, .psd, .csv, .msg, .mp4, .webm, .xlf, .xliff
        </p>
        <p>
            png, jpg, jpeg, jpe, gif, zip, doc, docx, docm, xls, xlsx, xlsm, ppt, pptx, pptm, pps, ppsx, odt, ods, odp, odf, rtf, pdf, psd, csv, msg, mp4, webm, xlf, xliff
        </p>
    </body>
</html>
"""

HTML(string=text).write_pdf("/tmp/weasyprint-test.pdf")

I first discovered this bug using version 57.2. I updated to 58.0 but the bug persists. I also tried different CSS values for white-space, hyphen and word-break but nothing changes.

@liZe liZe added the bug Existing features not working as expected label Feb 24, 2023
@liZe
Copy link
Member

liZe commented Feb 24, 2023

What a strange bug… 😢

@liZe
Copy link
Member

liZe commented Feb 25, 2023

Browsers break lines as we might expect, after the commas. But LibreOffice forces a line break anywhere, so it probably means that we can’t break after the comma according to the Unicode line break rules. I suppose that there’s an extra rule in the CSS specification to allow line breaks in this case, I’ll try find references in Unicode and CSS to define exactly what we have to do.

@liZe
Copy link
Member

liZe commented Feb 25, 2023

According to Unicode, we can’t break lines before commas, spaces and dots, that’s why WeasyPrint doesn’t break the line.

I didn’t find anything about this case in the W3C Typography specification. The CSS specification says that "CSS does not fully define where soft wrap opportunities occur", so technically it may not be a bug in WeasyPrint, but I’d be interested to know why browsers decided to split such lines.

@liZe liZe added CSS Questions about how to do something with CSS and removed bug Existing features not working as expected labels Feb 25, 2023
@arnKo
Copy link
Author

arnKo commented Feb 27, 2023

But shouldn't it work using overflow-wrap: anywhere?

At least according to the specs:

The overflow-wrap property allows the UA to take a break anywhere in otherwise-unbreakable strings that would otherwise overflow.

@liZe
Copy link
Member

liZe commented Mar 7, 2023

But shouldn't it work using overflow-wrap: anywhere?

It works … but not in this example. There’s something really strange…

@liZe liZe added bug Existing features not working as expected and removed CSS Questions about how to do something with CSS labels Mar 7, 2023
@liZe liZe added this to the 63.0 milestone Jul 11, 2024
@liZe liZe closed this as completed in 071e733 Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Existing features not working as expected
Projects
None yet
Development

No branches or pull requests

2 participants