Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The server roomlist, JSON, nlohmann/json, UTF-8, and UTF-16 #134

Closed
kevinlul opened this issue Apr 6, 2020 · 0 comments
Closed

The server roomlist, JSON, nlohmann/json, UTF-8, and UTF-16 #134

kevinlul opened this issue Apr 6, 2020 · 0 comments
Labels

Comments

@kevinlul
Copy link

kevinlul commented Apr 6, 2020

Earlier today there were client crashes from refreshing the roomlist. This was a parsing exception which I have fixed in cbe8ed5. However, this will cause the roomlist to go blank on otherwise perfectly valid JSON.

After some trimming, the actual simple case for the parsing exception is this

{
    "name": "\udc49"
}

which throws

[json.exception.parse_error.101] parse error at line 2, column 19: syntax error while parsing value - invalid string: surrogate U+DC00..U+DFFF must follow U+D800..U+DBFF; last read: '"\udc49'

This is because U+DC49 does not map to a valid UTF-8 codepoint, and nlohmann/json uses exclusively UTF-8. If you parse that JSON in JavaScript or Python (or possibly other runtimes that understand UTF-16), you will simply get the escape in ASCII or a Unicode invalid character symbol.

Looking into the JSON repo, I found these issues:
nlohmann/json#587
nlohmann/json#1198

In particular, from the creator

I think it is out of scope of the library to fix invalid UTF-8. I further do not think that silently fixing invalid inputs makes the world a better place, but rather rejecting such inputs with as much noise and as early as possible.

so it appears that this behaviour is intentional. This means that we have no way around this within the client without switching to another JSON library.

Short-term though, since this output is produced by SRVPro, we could sanitize all user inputtable fields to exclude invalid UTF-8 code points to not break the client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant