-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode: add CategoryAliases, LC, Cn #70780
Comments
Related Issues (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
I implemented this, and there are a few additions. The proposal is now:
|
Change https://go.dev/cl/641395 mentions this issue: |
Change https://go.dev/cl/641376 mentions this issue: |
Change https://go.dev/cl/641377 mentions this issue: |
This proposal has been added to the active column of the proposals project |
Could there be any compatibility issues with new Unicode versions? Dropped or renamed or changed aliases? Will regexp then use the map? Edit: The changes to regexp are at #70781. |
In general, Unicode data is subject to change as Unicode changes. That said, I don't expect aliases to be deleted from the list. (We've seen them change the category of an individual code point in the past, but even that is rare.) |
Have all remaining concerns about this proposal been addressed? The proposal is to add:
|
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add:
The C table is expanded to include unassigned code points (as it should have had from the start). |
No change in consensus, so accepted. 🎉 The proposal is to add:
The C table is expanded to include unassigned code points (as it should have had from the start). |
CategoryAliases is for regexp to use, for things like \p{Letter} as an alias for \p{L}. Cn and LC are special-case categories that were never implemented but should have been. For golang/go#70780. Change-Id: I1401c1be42106a0ebecabb085c25e97485c363cf Reviewed-on: https://go-review.googlesource.com/c/text/+/641395 Auto-Submit: Russ Cox <[email protected]> Reviewed-by: Marcel van Lohuizen <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
The Unicode specification defines aliases for some of the general category names. For example the category "L" has alias "Letter".
The regexp package supports \p{L} but not \p{Letter}, because there is nothing in the Unicode tables that lets regexp know about Letter.
In order to support \p{Letter}, I propose to add a new, small table to unicode,
This would be auto-generated from the Unicode database like all our other tables. For Unicode 15, the table would have only 38 entries, listed below.
The text was updated successfully, but these errors were encountered: