Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtle bug opportunity with utf-8 converter #1

Open
Omnifarious opened this issue May 27, 2018 · 0 comments
Open

Subtle bug opportunity with utf-8 converter #1

Omnifarious opened this issue May 27, 2018 · 0 comments

Comments

@Omnifarious
Copy link

Omnifarious commented May 27, 2018

I would like to point out that it is very easy to end up with a subtle buffer overrun bug with the SSE optimized UTF-8 converter. I want to point it out because it's subtle and easy to miss. Given your current interface guarantees, it isn't really a bug yet... :-)

So, if the tail of the string you're converting contains an ASCII character followed by a bunch of multibyte characters, the SSE code will kick in and convert all the multibyte characters as if they're ASCII and write them into the destination buffer. Then you will notice that you've written too many, walk things back and begin multibyte conversion with the first multibyte character.

But, you've still written all those bytes. And someone may have sized the output buffer with prior knowledge of how many code points will be generated. An output buffer sized in this way will be too small to handle this case and there will be a buffer overrun. Even worse, that buffer overrun will be subtle because the final reported used output size will be just fine and it will all appear to have worked.

So, when implementing proper error handling, or using this, people really need to keep this in mind and make sure the output buffer is large enough to handle the last 43 bytes of the input buffer being converted as ASCII even if they aren't.

It needs to be larger than 16 because someone might have an ASCII byte followed by 14 3 byte sequences and only have 15 output slots to store it and the SSE code will write to 16 output slots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant