Code point ranges are not parsed in UnicodeData.txt #39

wismill · 2021-11-10T14:55:49Z

See: https://www.unicode.org/reports/tr44/#Code_Point_Ranges

For backward compatibility, ranges in the file UnicodeData.txt are specified by entries for the start and end characters of the range, rather than by the form "X..Y". The start character is indicated by a range identifier, followed by a comma and the string "First", in angle brackets. This entry takes the place of a regular character name in field 1 for that line. The end character is indicated on the next line with the same range identifier, followed by a comma and the string "Last", in angle brackets:

4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FEF;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;

For character ranges using this convention, the names of all characters in the range are algorithmically derivable. See Section 4.8, Name in [Unicode] for more information on derivation of character names for such ranges.

wismill mentioned this issue Nov 10, 2021

Add General_Category and further predicates #40

Merged

wismill closed this as completed in #40 Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code point ranges are not parsed in UnicodeData.txt #39

Code point ranges are not parsed in UnicodeData.txt #39

wismill commented Nov 10, 2021

Code point ranges are not parsed in UnicodeData.txt #39

Code point ranges are not parsed in UnicodeData.txt #39

Comments

wismill commented Nov 10, 2021