Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code point ranges are not parsed in UnicodeData.txt #39

Closed
wismill opened this issue Nov 10, 2021 · 0 comments · Fixed by #40
Closed

Code point ranges are not parsed in UnicodeData.txt #39

wismill opened this issue Nov 10, 2021 · 0 comments · Fixed by #40

Comments

@wismill
Copy link
Collaborator

wismill commented Nov 10, 2021

See: https://www.unicode.org/reports/tr44/#Code_Point_Ranges

For backward compatibility, ranges in the file UnicodeData.txt are specified by entries for the start and end characters of the range, rather than by the form "X..Y". The start character is indicated by a range identifier, followed by a comma and the string "First", in angle brackets. This entry takes the place of a regular character name in field 1 for that line. The end character is indicated on the next line with the same range identifier, followed by a comma and the string "Last", in angle brackets:

4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FEF;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;

For character ranges using this convention, the names of all characters in the range are algorithmically derivable. See Section 4.8, Name in [Unicode] for more information on derivation of character names for such ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant