-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringInterner: add support for UCN identifiers #838
base: master
Are you sure you want to change the base?
Conversation
22c237c
to
cd82381
Compare
Should these be unescaped in the preprocessor? Commenting out the $ gcc a.c
a.c: In function ‘bar’:
a.c:251:6: error: ‘struct S’ has no member named ‘你好’
251 | s.\u4F60\u597D = x;
| ^
a.c:252:13: error: ‘struct S’ has no member named ‘你好’
252 | return s.FOO;
| ^
$ clang a.c
a.c:251:7: error: no member named '你好' in 'struct S'
251 | s.\u4F60\u597D = x;
| ~ ^
a.c:252:14: error: no member named '\u4F60\u597D' in 'struct S'
252 | return s.FOO;
| ~ ^
a.c:247:13: note: expanded from macro 'FOO'
247 | #define FOO \u4F60 ## \u597D
| ^
<scratch space>:4:1: note: expanded from here
4 | \u4F60\u597D
| ^
2 errors generated.
$ arocc a.c -Wno-return-type
a.c:251:7: error: no member named '\u4F60\u597D' in 'struct S'
s.\u4F60\u597D = x;
^
a.c:252:14: error: no member named '\u4F60\u597D' in 'struct S'
return s.FOO;
^
a.c:247:13: note: expanded from here
#define FOO \u4F60 ## \u597D
^
<scratch space>:1:1: note: expanded from here
\u4F60\u597D
^
2 errors generated. gcc definitely handles this the best.
Yeah :) |
Good point about the preprocessor; the current implementation won't handle them correctly in preprocessor expressions. I'll try setting it up to create unescaped generated tokens |
Updated with a proof-of-concept of unescaping in the preprocessor. Still need to review expansion locations in the generated tokens and fix pasted tokens but it seems like this approach will work. |
This is in better shape now but I still think it could be improved - right now I'm validating UCN identifiers in the preprocessor but it should probably be done in the tokenizer, to prevent pasting from creating UCN tokens (e.g. |
Why couldn't that be done in the preprocessor? Merge at will but maybe leave the issue open (or open a new one) if this doesn't solve it adequately in your opinion. |
Closes #823
IMO C should not have added this feature :)
I renamed
StringInterner
toIdentifierInterner
to better reflect how it's meant to be used. I switched toXxHash3
instead ofWyhash
because it seems likeWyhash
requires more care / special handling around short vs long keys in a way that seemed tricky to handle without completely decoding the identifier first (which would require allocations).Overall I'm very happy with the design of the Zig stdlib hashmap API in that adding this was very straightforward and didn't add any complexity beyond the inherent complexity of the problem itself.
TODO: