-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lexing for unicode escape sequences #348
Conversation
fixes dotnet#338 Changes lexing of unicode escape sequences to match the F# spec (which says things should work the same as C#). - For short escape sequences, directly encode the hex value into a char - For long escape sequences, validate that the total codepoint is <= 0x0010FFFF - If it is, follow same logic as before (which was correct) - If it isn't, issue an error (same as C#)
@@ -124,7 +124,7 @@ let startString args (lexbuf: UnicodeLexing.Lexbuf) = | |||
BYTEARRAY (Lexhelp.stringBufferAsBytes buf) | |||
) | |||
else | |||
STRING (System.Text.Encoding.Unicode.GetString(s,0,s.Length))) | |||
STRING (Lexhelp.stringBufferAsString s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In initial testing, I computed the string both ways here and compared, to make sure there were no unexpected inconsistencies.
After running all of the tests, the only differences were cases where the old method was doing the wrong thing. So regression risk should be low.
@@ -51,7 +52,7 @@ val internal digit : char -> int32 | |||
val internal hexdigit : char -> int32 | |||
val internal unicodeGraphShort : string -> uint16 | |||
val internal hexGraphShort : string -> uint16 | |||
val internal unicodeGraphLong : string -> uint16 option * uint16 | |||
val internal unicodeGraphLong : string -> (uint16 option * uint16) option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of option of option we might want to create a new union type which represents the semantics better.
@forki thanks for the feedback, good suggestions. I'd like to keep the tests where they are, to avoid fragmentation. Using a dedicated DU type makes sense. I'll look into broader change to char buffer form byte buffer, if it's not too invasive I think that also makes sense. |
Looks good ... ship it :-) |
@forki I've updated to use a dedicated type, but it looked like too much effort to change the buffer design just for this bug |
/cc @agocke per request |
fixes #338
Changes lexing of unicode escape sequences to match the F# spec (which says things should work the same as C#).