Fix zwsp classification #81

Entomy · 2020-04-01T22:47:29Z

As proposed in L2/08-344 U+200B, or the zwsp, was changed in classification in a way that broke several of the worlds languages in a rather serious way. This is definitely a mistake, and SIL agrees. Microsoft doesn't seem to agree however, and I won't have anything I write and maintain be exclusionary to any group, especially on a strictly ethnic basis.

IsWhitespace(Char) and IsWhitespace(Rune) are effected by this in this repo.

The text was updated successfully, but these errors were encountered:

Entomy · 2020-04-03T11:37:51Z

So I've looked into this more, along with various tests, using some random Thai and Mayanmar samples I've found, and asking some native speakers about correctness (god bless the internet).

I don't actually agree with any of the above parties, actually.

UNICODE Consortium is correct in that the classification as Zs for spacing causes combining marks to potentially behave incorrectly. In practice this shouldn't happen often, if at all, but as plenty of people who review my code or comments can tell, I refuse to operate on the assumption of "shouldn't ever happen". Such a thing would be a valid sequence of UNICODE and must be handled.

However, L2/08-334 is correct in that UAX#29.4 became broken in the change.

What I'm going to propose is that zwsp remains classified as Cf and not Zs, but that wound boundary detection (https://github.com/Stringier/Literary/issues/13) accurately detect and handle zwsp as a word boundary for Thai, Mayanman, Khmer, and others.

Entomy added 🐝 Bug Something isn't working 🛠 Enhancement New feature or request 🆘 Help Wanted Extra attention is needed 👨🏻‍🎓 Good First Issue Good for newcomers labels Apr 1, 2020

Entomy closed this as completed Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix zwsp classification #81

Fix zwsp classification #81

Entomy commented Apr 1, 2020

Entomy commented Apr 3, 2020

Fix zwsp classification #81

Fix zwsp classification #81

Comments

Entomy commented Apr 1, 2020

Entomy commented Apr 3, 2020