Add an option to interpret request headers as Latin1 encoded #17399

analogrelay · 2019-11-25T23:35:40Z

We have customers interested in treating incoming request headers as Latin1-encoded but our default logic only supports UTF-8.

In a 3.1 patch we should provide:

A config setting to change the default encoding to Latin1
When enabled, Kestrel will widen each incoming byte to a UTF-16 character and build a string from that set of characters, instead of interpreting bytes as UTF-8.
We will continue to reject control characters from the ASCII set (0x00-0x1F and 0x7F), but not reject control characters from the widened Latin1 set (because they would collide with other interpretations of this data, like UTF-8).

This would allow a consumer to reinterpret the data in this string as a UTF-8 sequence if desired.

In 5.0 we'll look at broader work to improve this experience. The 3.1 goals are scoped down to specific requests we've received.

The text was updated successfully, but these errors were encountered:

lodejard · 2019-11-25T23:47:02Z

Sounds great, thank you! If I understand what you've written this it means:

When 3.1 config flag enabled for latin1 request headers:
- visible USASCII header-value octets 0x20-0x7E are mapped to char codes U+0020-U+007E
- opaque header-value octets 0x80-0xFF are mapped to char codes U+0080-U+00FF
- header-value octets 0x00-0x1F and 0x7F are unprintable USASCII and not expected to map to unicode char in any request header value strings
  - some unprintable ASCII is already consumed as http protocol delimiters
  - other unprintable ASCII will cause the request to be rejected as malformed

analogrelay · 2019-11-25T23:49:39Z

Yes, that is correct.

lodejard · 2019-11-25T23:55:08Z

Will there be a corresponding response header flag, to round-trip opaque data? Like

When 3.1 config flag enabled for latin1 response headers:
- char codes U+0020-U+007E are mapped to visible USASCII header-value octets 0x20-0x7E
- char codes U+0080-U+00FF are mapped to opaque header-value octets 0x80-0xFF
- all other char codes (U+0000-U+001F, U+007F, U+0100 and higher) are not expected to map to header-value octets
  - if present may cause the response to be rejected server-side

halter73 · 2019-11-26T00:22:38Z

Will there be a corresponding response header flag, to round-trip opaque data?

We are considering this for 5.0, but so far it seems like this doesn't meet the bar for a 3.1 patch.

analogrelay · 2020-01-07T22:23:08Z

@Tratcher @halter73 between the two of you, could you take a look at getting this prepped on release/3.1? I'd like to try and take this in Feb, which means we need a ready-to-merge PR next week.

analogrelay · 2020-01-07T22:23:30Z

De-milestoned to make this come up in triage again for awareness.

Tratcher · 2020-01-08T01:08:42Z

@halter73 when could you have bandwidth to start this? I'm busy for a few days.

halter73 · 2020-01-08T02:52:43Z

I can start looking into this tomorrow between my build rotation responsibilities.

analogrelay · 2020-01-08T21:19:30Z

@davidfowl what was the reasoning for not using an AppContext switch? It seems more straightforward and I'd like to keep this super simple to avoid risk.

Tratcher · 2020-01-08T22:53:31Z

AppContext switches are static and therefor untestable.

Tratcher · 2020-01-08T23:02:25Z

Kestrel already reads config for binding information, reading more does not require any new infrastructure.

analogrelay · 2020-02-14T17:41:13Z

This was implemented in #18255 and will be released in 3.1.3

analogrelay added the area-servers label Nov 25, 2019

analogrelay added this to the 3.1.x milestone Nov 25, 2019

analogrelay mentioned this issue Nov 25, 2019

[Kestrel] Support for custom decoder in headers #17400

Closed

analogrelay removed this from the 3.1.x milestone Jan 7, 2020

analogrelay added the triage-focus Add this label to flag the issue for focus at triage label Jan 7, 2020

analogrelay assigned halter73 Jan 8, 2020

analogrelay added this to the 3.1.x milestone Jan 8, 2020

analogrelay removed the triage-focus Add this label to flag the issue for focus at triage label Jan 8, 2020

halter73 mentioned this issue Jan 10, 2020

Add option to interpret request headers as Latin1 #18255

Merged

analogrelay closed this as completed Feb 14, 2020

analogrelay added ✔️ Resolution: Fixed The bug or enhancement requested in this issue has been checked-in! enhancement This issue represents an ask for new feature or an enhancement to an existing one labels Feb 14, 2020

ghost locked as resolved and limited conversation to collaborators Mar 24, 2020

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to interpret request headers as Latin1 encoded #17399

Add an option to interpret request headers as Latin1 encoded #17399

analogrelay commented Nov 25, 2019

lodejard commented Nov 25, 2019

analogrelay commented Nov 25, 2019

lodejard commented Nov 25, 2019

halter73 commented Nov 26, 2019

analogrelay commented Jan 7, 2020

analogrelay commented Jan 7, 2020

Tratcher commented Jan 8, 2020

halter73 commented Jan 8, 2020

analogrelay commented Jan 8, 2020

Tratcher commented Jan 8, 2020

Tratcher commented Jan 8, 2020

analogrelay commented Feb 14, 2020

Add an option to interpret request headers as Latin1 encoded #17399

Add an option to interpret request headers as Latin1 encoded #17399

Comments

analogrelay commented Nov 25, 2019

lodejard commented Nov 25, 2019

analogrelay commented Nov 25, 2019

lodejard commented Nov 25, 2019

halter73 commented Nov 26, 2019

analogrelay commented Jan 7, 2020

analogrelay commented Jan 7, 2020

Tratcher commented Jan 8, 2020

halter73 commented Jan 8, 2020

analogrelay commented Jan 8, 2020

Tratcher commented Jan 8, 2020

Tratcher commented Jan 8, 2020

analogrelay commented Feb 14, 2020