-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text] Render U+2028 LINE SEPARATOR as a forced line break #6992
Comments
Thanks, @tabatkins! I can't edit the issue description directly, but here it is with the markup fixed up to render correctly on GitHub: [Copied into OP] |
I tested the rendering of this character in various browsers and editors, for you reference. In Chromium it is rendered as a box with a cross: (font is Hiragino Kaku Gothic ProN) In Firefox, Safari, and iCab, it doesn't display at all. In Visual Studio Code, the editor will emit a warning when it detects this character. See microsoft/vscode#96142 In Atom, it is not rendered. See atom/atom#12157 In Sublime Text 4, it is rendered as In TextEdit it is rendered as a forced line break. In GNU Emacs (27.2) it is rendered as horizontal whitespace instead of a line break, even after enabling whitespace-mode. In Vim (8.2) it is the same. For the applications I tested, only TextEdit renders this character as a newline. See also:
|
Thank you for doing this research, @xfq ! |
I think this issue is filed on the basis of some misunderstandings.
CSS3 Text has, technically, required LS to be treated as a forced break for at least a decade. If browsers are not treating it as such, that should be considered a bug against them. Closing as invalid (not a spec issue). @zestyping Copied your fixed markup into the OP! Thanks for caring about this issue, I hope your concern can motivate the browsers to fix this longstanding problem. |
@xfq Tests for any behavior specced in css-text-3, even if indirectly, are welcome in WPT. :) Probably best to do it as a test for all BK/NL characters. |
@fantasai Thank you for clarifying this! I do see now that Section 4.1 did not mean to refer to U+2028 when defining "other space separators".
Can this be taken as an official statement on the WG's intended interpretation of LS? I would be delighted to know that treating U+2028 as a forced line break is already the behaviour that CSS Text 3 intends to specify! I can imagine browser developers not finding this to be obvious from the spec. If this interpretation is not clear to them, would it be appropriate for me to point them at this comment thread as an authoritative ruling? Here is why I suspect they might find it rather subtle. CSS Text 3 mentions many other relevant characters by code point (such as U+000A, U+0020, etc.) and name (CARRIAGE RETURN, IDEOGRAPHIC SPACE, etc.). Yet U+2028 is never mentioned anywhere in the entire spec. Neither LINE SEPARATOR nor its abbreviation LSEP is mentioned anywhere. Neither the "Line Separator" category nor its abbreviation "Zl" is mentioned anywhere. An ordinary person can wonder "I wonder why U+2028 doesn't render as a line break", search for the spec, arrive at CSS Text 3, search the entire document for every imaginable term related to U+2028, and find nothing — indeed, that was my experience, and what led me to file this issue. And, of course, we have the empirical evidence of a decade of browser development oblivious to this rule. Would the CSS editors be willing to consider making this a little more explicit? I can think of one small change that would clear this all up. As you pointed out, Section 5.1, bullet point 2 says "lines always break at each preserved forced break character".
But there is no definition for the term "forced break character" in the spec. If you assume that a "forced break character" has something to do with a "forced line break", then the term "preserved forced break character" is nonsensical: "forced line break" is defined in terms of preserved characters, so there can be no such thing as a non-preserved forced break character. If you instead start by trying to understand the term "preserved", you find that it is defined only as part of the term "preserved white space", wherein the default meaning of "white space" is "document white space characters", which consists of U+0020, U+0009, and segment breaks; so "preserved" has no meaning when applied to other characters like U+2028. Fixing this is easy; delete the confusing term and simplify the bullet point to:
(I am omitting VT and NEL here because UAX#14 says "implementations are not required to support the VT character" and "implementations are not required to support the NEL character".) |
@xfq Thank you for filing https://bugs.webkit.org/show_bug.cgi?id=235753 ! |
I'd agree with that interpretation. css-text-3 states that:
UAX14 States that 2028 has non-tailorable BK class, and that “The text after [it] starts at the beginning of the line”. There's a level of indirection, which may make it non obvious on a casual read, but I think it's unambiguous that this is the expected behavior.
css-text-3 mentions those characters where special css-specific processing going beyond (or against) Unicode is needed. For the rest, as stated in 1.5, “CSS is built on Unicode. UAs […] must adhere to all normative requirements of the Unicode Core Standard, except where explicitly overridden by CSS.” So css-text-3 cannot be implemented correctly without referencing Unicode (and in particular UAX14), which in the case of U+2028, gives us a definitive normative answer. That said, if an editorial chance can make this clearer, I'd be happy to take that on.
I don't think this quite works. That covers the BK class, but leaves off preserved segments breaks (U+000A). Also
I am interpreting css-text-3 to be going beyond Unicode here, removing the optionality, and adding a requirement that this be supported for the sake of interoperability, so I'd rather keep it. How about
|
@frivoal That looks great! I agree with your reasoning. Thank you for the careful review and clarification. |
@fantasai does the proposal at the bottom of #6992 (comment) look reasonable to you, or do you think I missed something? |
@fantasai I see that the first sentence of @frivoal's suggestion made it into https://www.w3.org/TR/css-text-4/:
but not the second sentence:
Any particular reason why this should not be included? I realize these code points are implied by reference to UAX14, but it seems nice to be explicit, especially given that plenty of other code points are mentioned by number in this draft. |
@zestyping As noted in #6992 (comment), that sentence was always there: https://www.w3.org/TR/css-text-3/#line-break-details |
Updated the specs to use Florian's rephrasing. As for a note listing all the individual codepoints... I think it's better to just make sure there's testcases in WPT. |
…rs creating line breaks, a=testonly Automatic update from web-platform-tests Add tests for BK and NL Unicode characters creating line breaks See w3c/csswg-drafts#6992 -- wpt-commits: a8ee96901b9eabf3876d38d3328bf1320b115ca6 wpt-pr: 37696
…rs creating line breaks, a=testonly Automatic update from web-platform-tests Add tests for BK and NL Unicode characters creating line breaks See w3c/csswg-drafts#6992 -- wpt-commits: a8ee96901b9eabf3876d38d3328bf1320b115ca6 wpt-pr: 37696
Originally posted by Ka-Ping Yee
I'd like to propose that U+2028 be rendered as a forced line break.
The changes to the CSS Text Module Level 3 draft would be minimal; for example:
The rationale is straightforward:
For reference, the Unicode Standard 14.0 defines U+2028 LINE SEPARATOR as an "unambiguous separator character". By my reading, it could hardly be more clear as to what U+2028 is intended to represent, and what the most sensible rendering should be:
[...]
[...]
I'd appreciate hearing your thoughts and suggested next steps on this.
Thanks very much!
The text was updated successfully, but these errors were encountered: