-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WriteConsoleOutputCharacterA doesn't merge UTF-8 partials in successive calls #1851
Comments
@miniksa I was able to include the |
The discussion in #386 seemed to suggest that this problem had been around since Windows 7, but what I'm seeing definitely looks like a recent regression, so I'm not sure if it's the same issue. In my current Windows version (10.0.18362.535), the following sequences both output a smiling face when executed in a bash shell.
However, when I start a conhost shell built from the latest source (commit 39d3c65), the second sequence fails to decode the UTF-8 correctly, and three error glyphs are output instead. Screenshots of the old and new consoles... |
Not sure what functions are invoked under the hood if you call However, issue #4086 has been fixed with PR #4422 which indicates that |
I'm afraid it does look like PR #4422 is to blame, at least for my particular test case. It works in commit 0d92f71, but fails after #4422 is applied in commit 06b3931. It seems that the old |
I updated the unit test with this sequences and it failed. So it's definitely reproduceable. I'll file an issue referring to your comment here. And I'll fix it as soon as I can. Thanks for letting me know! |
@zadjii-msft
The docs also read
Consider to use the answers to these questions as reasons to close this one out. Steffen |
In the context of
It's probably fine honestly to fix the UTF-8 handling as a separate path. We did that in a few places... there's 3 states... W, A, and A when 65001 (UTF-8) is set. That way we can maintain the nice and broken A state for compatibility reasons and fix up the UTF-8 that we actually care about independently.
Oh probably.
I'll let it hang around for now, but don't feel obligated to further bump it. If it's on 22H2 we're going to keep looking at it ourselves. |
Environment
Steps to reproduce
If a UTF-8 stream gets buffered in a loop, characters that consume more than one byte may get split at the buffer boundaries. Passing the buffer to
WriteConsoleOutputCharacterA
will corrupt the text because a conversion to UTF-16 is in place where these partials are treated as invalid UTF-8 characters and replaced with with U+FFFD characters.Expected behavior
WriteConsoleOutputCharacterA
should cache the partials and prepend them to the characters passed at the next call of this function, similar to the behavior ofWriteConsoleA
.Actual behavior
UTF-8 partials result in corrupted text.
A discussion about this already began in #386 but was rather out of scope in this issue.
The text was updated successfully, but these errors were encountered: