Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined character listed as UTF8PROC_CATEGORY_LO #194

Closed
maartenbreddels opened this issue Jul 7, 2020 · 3 comments
Closed

Undefined character listed as UTF8PROC_CATEGORY_LO #194

maartenbreddels opened this issue Jul 7, 2020 · 3 comments

Comments

@maartenbreddels
Copy link

Hi all,

I found in apache/arrow#7656 that undefine characters (such as https://www.compart.com/en/unicode/U+08BE) are listed as UTF8PROC_CATEGORY_LO (using utf8proc_category). Could this be a bug?

Regards,

Maarten

@stevengj
Copy link
Member

stevengj commented Jul 8, 2020

U+08BE was defined in Unicode 13, and Lo is correct.

@stevengj stevengj closed this as completed Jul 8, 2020
@stevengj
Copy link
Member

stevengj commented Jul 8, 2020

(It correctly returns UTF8PROC_CATEGORY_CN for currently unassigned codepoints like U+0378.)

@maartenbreddels
Copy link
Author

You are correct, it didn't even cross my mind that Unicode changes that fast (apart from emoticons), thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants