-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft hyphenation in Weasyprint #176
Comments
I belive that breaking on U+00AD (a.k.a. |
Hey Simon, thanks for your fast reply. So you can confirm this? Would it make sense to handle soft hyphens in the text.py, where automatic hyphenation is handled as well? Greetings, |
I think it would make sense (though @liZe might have an opinion), it just needs someone to do the work. |
As far as I can remember, WeasyPrint is only concerned about automatic hyphenation, and relies on Pango to do the manual part of hyphenation. It would make sense to make WeasyPrint handle both parts, because 1) the rules to know where text can be split are different between HTML+CSS and Pango, and 2) text is already split by WeasyPrint between tags before letting Pango split it, so text can be broken inside a word when the letters are put in tags one by one without spaces between them (and that's really bad). Handling soft hyphens is a really small part of the work. The rules to break lines depend for example on the language of the text and on some CSS properties. These rules are not handled by Pango, so we have to do the whole work in WeasyPrint, without relying on Pango at all. The code added for automatic hyphenation was a first step to solve this problem, that was quite hard to add, it is hard to understand how it works now, but that was a really simple piece of code compared to the complex rules we need to respect if we want to correctly handle line breaks. By the way, before adding support of more complex rules to this part of the code, we should really add the support of right-to-left languages. The specification takes care of this feature, and we can't seriously break lines without handling text direction before. The code will be closer to the specification and thus easier to understand once we have rtl support (yes, that's my pessimistic point of view about text :p). |
As long as we don’t have someone working on RTL support I don’t think we should block any text-related work on it. |
Hello Simon, hello liZe, I understand your considerations here. I will take a look at weasyprint's automatic hyphenation code and try add support for soft hyphenation. As far as I am concerned this should all happen in text.py. Greetings, |
Yes, that sounds right. Thanks a lot for volunteering for this! |
Hey, concerning line breaks within tags, as liZe mentioned it. Is this also the reason why weasyprint does not respect or white-space: no-wrap? It would already help a lot and improve the situation, if it would. |
WeasyPrint repects |
I have finally found some time to look into this. As far as I understand weasyprint retrieves the laid out lines from Pango. Since Pango is not capable of automatic hyphenation, weasyprint tries to put the first word from the second line on the first line. If it does not fit and hyphenation is set to auto, it will try to split the word and try again. So far so good. [EDIT: Reposted because I have accidentally used an old account for posting this.] |
Grouped in #301. |
For the record: fixed before closing #301 by a random WeasyPrint version. |
Hello,
I have been experimenting with hyphenation recently. Automatic hyphenation works like a charme. It's simply amazing.
However, while manual hyphenation works, weasyprint never appends the actual hyphen character "-". It will break correctly, if needed, when encountering a shy but the trailing hyphen character is omitted.
I have tested it using this simple HTML file
Now, I have investigated and browsed the relevant parts in text.py, but it appears as if weasyprint is only concerned with automatic hyphenation and does not do the manual hyphenation part at all.
Could anybody confirm this behaviour? Or give me a hint what I am doing wrong? If this is indeed a bug maybe somebody could point out the relevant parts of the code?
Thanks for any comments :)
The text was updated successfully, but these errors were encountered: