-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SocketsHttpHandler.ResponseHeaderEncodingSelector
may be ignored for Location headers
#95799
Comments
Tagging subscribers to this area: @dotnet/ncl Issue Details
runtime/src/libraries/System.Net.Http/src/System/Net/Http/Headers/HeaderDescriptor.cs Lines 161 to 168 in 31933a3
The handler also has the option for users to specify the encoding to use for each header via This is mostly harmless for most uses of HttpClient but does impact scenarios where the user is expecting a different encoding to be used, even if the bytes do look like UTF-8. But if Given that Side note: Even if the bytes are valid UTF-8, that doesn't mean they actually are UTF-8. There are valid sequences of Latin1 encoded text that will produce byte sequences that also happen to be valid UTF-8. For such cases, SocketsHttpHandler is silently corrupting the data (the string header value will be nonsense).
|
SocketsHttpHandler
will generally assume headers were encoded as Latin1.The location header is an exception where we'll also check if the bytes look like UTF-8 and decode them using that if they do.
runtime/src/libraries/System.Net.Http/src/System/Net/Http/Headers/HeaderDescriptor.cs
Lines 161 to 168 in 31933a3
The handler also has the option for users to specify the encoding to use for each header via
ResponseHeaderEncodingSelector
, but in the case of the Location header, it may still use UTF-8, overriding the user's wishes to use a different encoding.This is mostly harmless for most uses of HttpClient but does impact scenarios where the user is expecting a different encoding to be used, even if the bytes do look like UTF-8.
For example with YARP, the user may decide to set Latin1 as the encoding to use for all headers in all directions. No matter what encoding the values are actually using, we'll use Latin1. While the string representations of those headers may look like garbage, Latin1 decoding and then Latin1 encoding some bytes will result in the same set of bytes, thus passing values through the proxy without data loss. That is, two sets of data corruption are canceling each other out.
But if
SocketsHttpHandler
decides to use a different encoding, this no longer holds. While the string representation is now likely going to be "correct", the value will now be modified when encoded using Latin1.Given that
ResponseHeaderEncodingSelector
gives the user the option to specify the encoding to use for each header, I believe we should be honoring that first if set, before trying to guess what encoding was used.Side note: Even if the bytes are valid UTF-8, that doesn't mean they actually are UTF-8. There are valid sequences of Latin1 encoded text that will produce byte sequences that also happen to be valid UTF-8. For such cases, SocketsHttpHandler is silently corrupting the data (the string header value will be nonsense).
The text was updated successfully, but these errors were encountered: