Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to interpret request headers as Latin1 encoded #17399

Closed
analogrelay opened this issue Nov 25, 2019 · 12 comments
Closed

Add an option to interpret request headers as Latin1 encoded #17399

analogrelay opened this issue Nov 25, 2019 · 12 comments
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one ✔️ Resolution: Fixed The bug or enhancement requested in this issue has been checked-in!
Milestone

Comments

@analogrelay
Copy link
Contributor

We have customers interested in treating incoming request headers as Latin1-encoded but our default logic only supports UTF-8.

In a 3.1 patch we should provide:

  • A config setting to change the default encoding to Latin1
  • When enabled, Kestrel will widen each incoming byte to a UTF-16 character and build a string from that set of characters, instead of interpreting bytes as UTF-8.
  • We will continue to reject control characters from the ASCII set (0x00-0x1F and 0x7F), but not reject control characters from the widened Latin1 set (because they would collide with other interpretations of this data, like UTF-8).

This would allow a consumer to reinterpret the data in this string as a UTF-8 sequence if desired.

In 5.0 we'll look at broader work to improve this experience. The 3.1 goals are scoped down to specific requests we've received.

@lodejard
Copy link
Contributor

Sounds great, thank you! If I understand what you've written this it means:

  • When 3.1 config flag enabled for latin1 request headers:
    • visible USASCII header-value octets 0x20-0x7E are mapped to char codes U+0020-U+007E
    • opaque header-value octets 0x80-0xFF are mapped to char codes U+0080-U+00FF
    • header-value octets 0x00-0x1F and 0x7F are unprintable USASCII and not expected to map to unicode char in any request header value strings
      • some unprintable ASCII is already consumed as http protocol delimiters
      • other unprintable ASCII will cause the request to be rejected as malformed

@analogrelay
Copy link
Contributor Author

Yes, that is correct.

@lodejard
Copy link
Contributor

Will there be a corresponding response header flag, to round-trip opaque data? Like

  • When 3.1 config flag enabled for latin1 response headers:
    • char codes U+0020-U+007E are mapped to visible USASCII header-value octets 0x20-0x7E
    • char codes U+0080-U+00FF are mapped to opaque header-value octets 0x80-0xFF
    • all other char codes (U+0000-U+001F, U+007F, U+0100 and higher) are not expected to map to header-value octets
      • if present may cause the response to be rejected server-side

@halter73
Copy link
Member

Will there be a corresponding response header flag, to round-trip opaque data?

We are considering this for 5.0, but so far it seems like this doesn't meet the bar for a 3.1 patch.

@analogrelay analogrelay removed this from the 3.1.x milestone Jan 7, 2020
@analogrelay analogrelay added the triage-focus Add this label to flag the issue for focus at triage label Jan 7, 2020
@analogrelay
Copy link
Contributor Author

@Tratcher @halter73 between the two of you, could you take a look at getting this prepped on release/3.1? I'd like to try and take this in Feb, which means we need a ready-to-merge PR next week.

@analogrelay
Copy link
Contributor Author

De-milestoned to make this come up in triage again for awareness.

@Tratcher
Copy link
Member

Tratcher commented Jan 8, 2020

@halter73 when could you have bandwidth to start this? I'm busy for a few days.

@halter73
Copy link
Member

halter73 commented Jan 8, 2020

I can start looking into this tomorrow between my build rotation responsibilities.

@analogrelay analogrelay added this to the 3.1.x milestone Jan 8, 2020
@analogrelay analogrelay removed the triage-focus Add this label to flag the issue for focus at triage label Jan 8, 2020
@analogrelay
Copy link
Contributor Author

@davidfowl what was the reasoning for not using an AppContext switch? It seems more straightforward and I'd like to keep this super simple to avoid risk.

@Tratcher
Copy link
Member

Tratcher commented Jan 8, 2020

AppContext switches are static and therefor untestable.

@Tratcher
Copy link
Member

Tratcher commented Jan 8, 2020

Kestrel already reads config for binding information, reading more does not require any new infrastructure.

@analogrelay
Copy link
Contributor Author

This was implemented in #18255 and will be released in 3.1.3

@analogrelay analogrelay added ✔️ Resolution: Fixed The bug or enhancement requested in this issue has been checked-in! enhancement This issue represents an ask for new feature or an enhancement to an existing one labels Feb 14, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2020
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one ✔️ Resolution: Fixed The bug or enhancement requested in this issue has been checked-in!
Projects
None yet
Development

No branches or pull requests

5 participants