-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reverse_adoc: Clean Unicode whitespace in headers and paragraphs #80
Conversation
7438c2c
to
99cdd4d
Compare
@hmdne I understand your concern with regards to the full-width space, but the question is actually about the compatibility of "AsciiDoc" (which uses ASCII sequences as control/markup sequences) and CJK in general. AsciiDoc syntax heavily depends on these control symbols that are not easily accessed/used in CJK:
Retracting our steps, notice that AsciiDoc was designed for "ASCII"-encoding, which is really made to allow easy and predictable entry on an English keyboard, and to an extent Latin based keyboards. CJK cannot be done in ASCII, so the consequences of an "easy-to-enter textual semantic syntax" for CJK are different from AsciiDoc. We need defined rules on what "AsciiDoc" means for "non-ASCII CJK", with the principle that it should be easy to type on a CJK keyboard. The comments about "a full width space means something" are unintended consequences with AsciiDoc compatibility with CJK:
It should not be the case. This is simply a CJK compatibility issue with AsciiDoc.
In CJK, the initial "full width spaces" (one or more than one) are formatting concerns. This is to be determined by the rendering template as part of "paragraph initial line indenting", it plays no part in the textual meaning.
They should be stripped from the table cells. |
If I use the Japanese keyboard and retain the semantics of the equal sign, hyphens, spaces, open/close brackets, comma, I get this. This means I won't need to swap between Japanese/English when entering. Wondering if this is something we should support... "(Ascii)Doc for CJK"
|
is there status on these updates? would a work-around be to fill any empty cells in a table with a single character? |
I have pushed an updated version that deals with almost all of the leading CJK whitespace in the document while trying to preserve compatibility. The only issue is with sections: as mentioned above, empty paragraphs are collapsed, but this is an issue with this particular document and may not be really an issue, if it is, please inform me on that. I have found another problem, with generation, but I will try to amend that shortly. |
5fa95dc
to
875dfb4
Compare
This is ready for merge now. |
Thanks @hmdne ! |
This fixes #65 and fixes #67.
I don't necessarily agree with this. A full-width space is semantically similar to an NBSP, ie. it's not trimmed by web browsers. If anything, I think this should not be a generic feature - while for this particular usecase, full-width space has no meaning, other than formatting - in other documents they may be crucial.
The character still persists in table cells, lists and sections (which are mapped from DIVs):
" "
.Metanorma PR checklist