avoid encoding errors with unicode content piped through stdio on Windows #997
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Consider this trivial file (with a trailing LF):
This command worked in cmd.exe or an MSYS terminal, and printed ≠ correctly:
This crashed with an encoding error:
Since bytes are read from
stdin.buffer
and decoded as UTF-8 when the input file is '-', it makes sense to write UTF-8 bytes tostdout.buffer
, and avoid using the default codepage. The use case here is wiring this up to thehg fix
extension, which writes content to the tool's stdin and reads it back from its stdout to reformat files. That shouldn't change the encoding.I conditionalized it to play it safe, since the characters showed up in the terminal correctly without the redirect. It also seems to display fine if unconditionally written as bytes though.
A workaround using the existing code is to set
PYTHONUTF8=1
in the environment, but that's not obvious or always easily done. This change also has the nice side effect of no longer changing LF input to CRLF output. (You'd think thatprint(..., end='')
would avoid printing the EOL, but that's apparently baked into theTextIO
object that issys.stdout
, and not something the print function can override.)