Improve PumpReader surrogate char handling #720
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses the additional notes in #658 and #659
Improves PumpReader surrogate char handling by making sure that the reader blocks when an incomplete surrogate pair (i.e. only a high surrogate char) is encountered.
Additionally changes the reading logic to consume pending data even if the reader has been closed. Otherwise it would be impossible to read a trailing high surrogate char because
CharsetEncoder.encode
was previously always called withendOfInput=false
and therefore would have never encoded that trailing high surrogate char at the end of the stream (maybe that was intended?).The added
AssertionError
s should not be reachable, even for invalid / incomplete input, also due to #716.