Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old Emoji defaults are misparsed #1025

Closed
eggrobin opened this issue Jan 30, 2025 · 2 comments · Fixed by #1029
Closed

Old Emoji defaults are misparsed #1025

eggrobin opened this issue Jan 30, 2025 · 2 comments · Fixed by #1029
Assignees

Comments

@eggrobin
Copy link
Member

eggrobin commented Jan 30, 2025

A look at the character.jsp page for A with history=full shows the following:
Image

LATIN CAPITAL LETTER A was not, in fact, an emoji between Unicode 8 and Unicode 14.

However, it had this @missing line:
# @missing: 0000..10FFFF ; Emoji ; No

And since we are parsing that as a binary property, we conclude that the entire codespace has the Emoji property (and there is an extra ignored field).

@eggrobin eggrobin self-assigned this Jan 30, 2025
@markusicu
Copy link
Member

Right. @macchiati came up with this extended syntax, and at a glance it seemed reasonable, but it wasn't documented and didn't work with existing parsers because they don't read the value field there.

I noticed that when I tried to add @missing lines for certain Extended_Pictographic ranges with "Yes" defaults, because we don't have documented syntax to override a binary Yes with a No for poking holes into such ranges.

Context:

@markusicu
Copy link
Member

See L2/22-124 item UCD17: “@missing lines do not work for binary properties”

[172-C16] Consensus: Remove @missing lines for binary properties where UCD syntax does not support them, and make other adjustments in UAX #44 and elsewhere for consistency; for Unicode Version 15.0.

[172-A73] Action Item for Ned Holbrook, Markus Scherer, PAG: In emoji-data.txt, remove the @missing lines; for Unicode Version 15.0.

[172-A74] Action Item for Ken Whistler, PAG: In UAX #44 (a) revert the changes to the paragraph that used to say that an @missing line is never provided for a binary property (so that it continues to say that for Unicode Version 15.0), and (b) change the example for multiple @missing lines from using Extended_Pictographic to using one of Bidi_Class, East_Asian_Width, Line_Break.

[172-A75] Action Item for Ken Whistler, PAG: In UAX #44 change text about @missing lines as per UCD17 in document L2/22-124, Section UCD17, item 3. For Unicode Version 15.0.

[172-A76] Action Item for Markus Scherer, PAG: Propose a UCD file syntax for explicit “No” values for binary properties, and use it for multiple @missing lines for Extended_Pictographic in emoji-data.txt; for a future version of the Unicode Standard. See L2/22-124 item UCD17.

A73..A75 got done.
We closed A76 after some discussion, without adding new syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants