`SocketsHttpHandler.ResponseHeaderEncodingSelector` may be ignored for Location headers #95799

MihaZupan · 2023-12-08T18:50:40Z

SocketsHttpHandler will generally assume headers were encoded as Latin1.
The location header is an exception where we'll also check if the bytes look like UTF-8 and decode them using that if they do.

runtime/src/libraries/System.Net.Http/src/System/Net/Http/Headers/HeaderDescriptor.cs

Lines 161 to 168 in 31933a3

    
           else if (knownHeader == KnownHeaders.Location) 
        
           { 
        
               // Normally Location should be in ISO-8859-1 but occasionally some servers respond with UTF-8. 
        
               if (TryDecodeUtf8(headerValue, out string? decoded)) 
        
               { 
        
                   return decoded; 
        
               } 
        
           }

The handler also has the option for users to specify the encoding to use for each header via ResponseHeaderEncodingSelector, but in the case of the Location header, it may still use UTF-8, overriding the user's wishes to use a different encoding.

This is mostly harmless for most uses of HttpClient but does impact scenarios where the user is expecting a different encoding to be used, even if the bytes do look like UTF-8.
For example with YARP, the user may decide to set Latin1 as the encoding to use for all headers in all directions. No matter what encoding the values are actually using, we'll use Latin1. While the string representations of those headers may look like garbage, Latin1 decoding and then Latin1 encoding some bytes will result in the same set of bytes, thus passing values through the proxy without data loss. That is, two sets of data corruption are canceling each other out.

But if SocketsHttpHandler decides to use a different encoding, this no longer holds. While the string representation is now likely going to be "correct", the value will now be modified when encoded using Latin1.

Given that ResponseHeaderEncodingSelector gives the user the option to specify the encoding to use for each header, I believe we should be honoring that first if set, before trying to guess what encoding was used.

Side note: Even if the bytes are valid UTF-8, that doesn't mean they actually are UTF-8. There are valid sequences of Latin1 encoded text that will produce byte sequences that also happen to be valid UTF-8. For such cases, SocketsHttpHandler is silently corrupting the data (the string header value will be nonsense).

The text was updated successfully, but these errors were encountered:

ghost · 2023-12-08T18:50:44Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

SocketsHttpHandler will generally assume headers were encoded as Latin1.
The location header is an exception where we'll also check if the bytes look like UTF-8 and decode them using that if they do.

runtime/src/libraries/System.Net.Http/src/System/Net/Http/Headers/HeaderDescriptor.cs

Lines 161 to 168 in 31933a3

    
           else if (knownHeader == KnownHeaders.Location) 
        
           { 
        
               // Normally Location should be in ISO-8859-1 but occasionally some servers respond with UTF-8. 
        
               if (TryDecodeUtf8(headerValue, out string? decoded)) 
        
               { 
        
                   return decoded; 
        
               } 
        
           }

The handler also has the option for users to specify the encoding to use for each header via ResponseHeaderEncodingSelector, but in the case of the Location header, it may still use UTF-8, overriding the user's wishes to use a different encoding.

This is mostly harmless for most uses of HttpClient but does impact scenarios where the user is expecting a different encoding to be used, even if the bytes do look like UTF-8.
For example with YARP, the user may decide to set Latin1 as the encoding to use for all headers in all directions. No matter what encoding the values are actually using, we'll use Latin1. While the string representations of those headers may look like garbage, Latin1 decoding and then Latin1 encoding some bytes will result in the same set of bytes, thus passing values through the proxy without data loss. That is, two sets of data corruption are canceling each other out.

But if SocketsHttpHandler decides to use a different encoding, this no longer holds. While the string representation is now likely going to be "correct", the value will now be modified when encoded using Latin1.

Given that ResponseHeaderEncodingSelector gives the user the option to specify the encoding to use for each header, I believe we should be honoring that first if set, before trying to guess what encoding was used.

Side note: Even if the bytes are valid UTF-8, that doesn't mean they actually are UTF-8. There are valid sequences of Latin1 encoded text that will produce byte sequences that also happen to be valid UTF-8. For such cases, SocketsHttpHandler is silently corrupting the data (the string header value will be nonsense).

Author:	MihaZupan
Assignees:	-
Labels:	`bug`, `area-System.Net.Http`
Milestone:	-

MihaZupan added bug area-System.Net.Http labels Dec 8, 2023

ghost added the untriaged label Dec 8, 2023

MihaZupan mentioned this issue Dec 8, 2023

Respect ResponseHeaderEncodingSelector for the Location header #95810

Merged

ghost added the in-pr label Dec 8, 2023

MihaZupan added this to the 9.0.0 milestone Dec 8, 2023

ghost removed the untriaged label Dec 8, 2023

MihaZupan closed this as completed in #95810 Dec 10, 2023

ghost removed the in-pr label Dec 10, 2023

github-actions bot locked and limited conversation to collaborators Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`SocketsHttpHandler.ResponseHeaderEncodingSelector` may be ignored for Location headers #95799

`SocketsHttpHandler.ResponseHeaderEncodingSelector` may be ignored for Location headers #95799

MihaZupan commented Dec 8, 2023

ghost commented Dec 8, 2023

SocketsHttpHandler.ResponseHeaderEncodingSelector may be ignored for Location headers #95799

SocketsHttpHandler.ResponseHeaderEncodingSelector may be ignored for Location headers #95799

Comments

MihaZupan commented Dec 8, 2023

ghost commented Dec 8, 2023

`SocketsHttpHandler.ResponseHeaderEncodingSelector` may be ignored for Location headers #95799

`SocketsHttpHandler.ResponseHeaderEncodingSelector` may be ignored for Location headers #95799