Skip to content
This repository has been archived by the owner on Dec 18, 2018. It is now read-only.

Configure header parsing to allow non-compliant headers #1144

Closed
blowdart opened this issue Oct 3, 2016 · 64 comments
Closed

Configure header parsing to allow non-compliant headers #1144

blowdart opened this issue Oct 3, 2016 · 64 comments

Comments

@blowdart
Copy link
Member

blowdart commented Oct 3, 2016

Rather than rejecting requests, allow configuration to either ignore UTF8 headers, or, parse them, even if they're illegal.

I suggest an enum Reject, Ignore, Parse.

However on output we should still be strict, UrlEncoding cookie values etc.

See #1076 and #1125

edit by @muratg: We should also consider request line when we get to this. #2647

@blowdart blowdart added this to the 1.1.0 milestone Oct 3, 2016
@Tratcher
Copy link
Member

Tratcher commented Oct 3, 2016

In practice I expect this to become an FAQ where most people turn on UTF-8, so we should consider making that the default.

Have we tested how these headers forward through IIS? WebListener?

@blowdart
Copy link
Member Author

blowdart commented Oct 3, 2016

I like how we say "We" Nope, please try it :p

@muratg
Copy link
Contributor

muratg commented Oct 6, 2016

@halter73 please work with @CesarBS, based on timing this may be load balanced to you.

cc @davidfowl

@muratg
Copy link
Contributor

muratg commented Oct 31, 2016

A big change (essentially a rewrite in header processing) so moving to 1.2.0.

@Tratcher could you file a corresponding bug on WebListener repo?

@muratg
Copy link
Contributor

muratg commented Oct 31, 2016

Should also fix: #1125

@cesarblum
Copy link
Contributor

Investigating this. Will check how those headers forward through IIS, and also what other servers do.

@cesarblum
Copy link
Contributor

cesarblum commented Jan 18, 2017

Here's what I found so far:

Server Behavior
IIS accepts UTF-8 in header value, don't know if immediatelly decoded
IIS running ASP.NET 4 app accepts UTF-8, decodes it as such
IIS with ANCM rejects, not yet sure why, but the request is not forwarded at all
WebListener accepts UTF-8, decodes it as such
nginx accepts non-ASCII, haven't checked yet what it does with it
node.js accepts non-ASCII, it's up to the app to decode it
Apache accepts non-ASCII, it's up to the app to decode it

A relevant bit from the RFC:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding.  In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets.  A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

I think the most relevant part here is:

A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

So it's not forbidding chars above 0x7F. obs-text is actually defined in the next section as

obs-text       = %x80-FF

The most correct behavior seems to be to accept characters in the 0x80 - 0xFF range (which we reject at present), but to let the app decide the encoding. Http.sys appears to deviate from this though, by decoding as UTF-8.

@Tratcher
Copy link
Member

For reference, which header did you test? You may get different results for a common header vs a custom header. E.g. Host and Location are often special cased.

@Tratcher
Copy link
Member

How does Apache leave it up to the app to decode it? Does it expose the raw header bytes?

@cesarblum
Copy link
Contributor

@Tratcher I was testing with Referer, as in #1125.

I tested Apache with a PHP app, which saw the header as raw bytes. You get a "string" for it, but I'd have to manually decode it as UTF-8 to get the right chars.

@JunTaoLuo
Copy link
Contributor

JunTaoLuo commented Aug 22, 2018

I was hoping we could address response headers separately. Although related, apps have more control over the response headers whereas they cannot control what request headers are sent by clients.

Scenario Name: François (Encode-able in UTF-8 and Extended ASCII) Message: 你好 (Encode-able in UTF-8 only)
Kestrel (currently) ❌Failed request ❌Failed request
IIS + Managed Handler (ASP.NET 4) ✔️ Encoded as UTF-8 ✔️Encoded as UTF-8
IIS + ANCM In-Proc ✔️ Encoded as UTF-8 ✔️Encoded as UTF-8
IIS + ANCM Out-of-proc ⚠️ Extended ASCII encoding becomes re-encoded as UTF-8, UTF-8 encoding becomes re-encoded as A UTF-8 representation of the Extended ASCII of the original UTF-8 Encoding (double encoded) ⚠️UTF-8 encoding becomes re-encoded as A UTF-8 representation of the Extended ASCII of the original UTF-8 Encoding (double encoded)
Apache ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
Apache + PHP ✔️ Encoded as UTF-8 ✔️Encoded as UTF-8
nginx ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
node.js ✔️ Encoded as Extended ASCII ❌Failed request

@effyteva
Copy link

effyteva commented Aug 25, 2018

We ran into this issue as well.
Is there any temporary fix to ignore invalid request headers? (our issue is in the Referrer header)

@JunTaoLuo
Copy link
Contributor

Parsing of request headers with UTF-8 encoded values has been merged and will be available in 2.2.0-preview2.

@JunTaoLuo
Copy link
Contributor

I've continued to do some additional investigation in how servers handle header names, path and query strings with non-ascii characters and this is what I found:

Header name Extended ASCII UTF-8
Kestrel (currently) ❌400 Bad Request ❌400 Bad request
IIS + Managed Handler (ASP.NET 4) ❌400 Bad Request ❌400 Bad Request
IIS + ANCM In-Proc ❌400 Bad Request ❌400 Bad Request
IIS + ANCM Out-of-proc ❌400 Bad Request ❌400 Bad Request
Apache ❌400 Bad Request ❌400 Bad Request
Apache + PHP ❌400 Bad Request ❌400 Bad Request
nginx ⚠️Request forwarded with non-ascii header removed ⚠️Request forwarded with non-ascii header removed
node.js ❌400 Bad Request ❌400 Bad Request
Path Extended ASCII UTF-8
Kestrel (currently) ❌Failed Request ❌Failed request
IIS + Managed Handler (ASP.NET 4) ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
IIS + ANCM In-Proc ❌ Request rejected but an empty 200 response is sent ✔️ Decoded as UTF-8
IIS + ANCM Out-of-proc ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
Apache ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
Apache + PHP ❌404 Not Found ✔️Decoded as UTF-8
nginx ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
node.js ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
Query String Extended ASCII UTF-8
Kestrel (currently) ❌Failed Request ❌Failed request
IIS + Managed Handler (ASP.NET 4) ⚠️ Decoded as UTF-8 ✔️Decoded as UTF-8
IIS + ANCM In-Proc ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
IIS + ANCM Out-of-proc ❌400 Bad Request ❌400 Bad Request
Apache ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
Apache + PHP ⚠️Decoded as UTF-8 ✔️Decoded as UTF-8
nginx ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
node.js ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII

Header names with non-ascii characters are almost universally rejected, other than nginx. We should follow the same pattern and continue to reject these requests. There's less consensus in how to treat requests with non-ascii characters in path and query string but these should be URL encoded. I think we can continue to reject these requests too, unless we have a compelling reason to do otherwise.

@effyteva
Copy link

effyteva commented Sep 5, 2018

I think it's highly critical to address this issue., at least in term of "removing" non-ascii headers.
Currently requests fail in case any header has non-ascii value, which on many cases can occur due to an external factor, such as referrer pointing to a UTF-8 URL.

@JunTaoLuo
Copy link
Contributor

@effyteva I have already made the changes to accept UTF-8 encoded characters in header values, which means UTF-8 encoded urls in the Referer header will now be accepted. However, we are still planning to reject requests containing non-ascii characters in the header names as well as path and query string values.

@effyteva
Copy link

effyteva commented Sep 5, 2018

@JunTaoLuo sorry, I didn't understand you already committed those changes for the 2.2 release.
Thanks, and keep up the great work!

@JunTaoLuo
Copy link
Contributor

We decided to not pursue accepting non-ASCII characters in header names, path and query string at this time. If there is compelling reason to enable these scenarios, please file another issue and we will re-prioritize. A follow up issue to address UTF-8 values in response header values has been filed at https://github.com/aspnet/KestrelHttpServer/issues/2884.

@sandeepchauhan
Copy link

sandeepchauhan commented Oct 11, 2018

Thanks for this fix. Is there an ETA for 2.2.0 release? @JunTaoLuo

@Tratcher
Copy link
Member

See the roadmap.

@turgayozgur
Copy link

turgayozgur commented Nov 23, 2018

We can encode. It is okay but what if we are migrating an old application to aspnetcore that already have too many customers who has the cookie contains non-ascii characters?

For now, there is a workaround if you are using nginx at the front of your aspnetcore application.

To remove all of the non ascii characters from the request header:

server {
    set_by_lua_block $cookie_ascii {
        local cookie = ngx.var.http_cookie
        if cookie == nil or cookie == '' then return cookie end
        local cookie_ascii, n, err = ngx.re.gsub(cookie, "[^\\x00-\\x7F]", "")
        return cookie_ascii
    }

    listen 80;
    server_name example.com;
    
    location / {
        proxy_pass          http://localhost:5000;
        ...
        proxy_set_header    Cookie $cookie_ascii;
        ...
    }    
}

@effyteva
Copy link

effyteva commented Dec 8, 2018

We still suffer from the issue, after upgrading to ASP.NET Core 2.2.
Is there any setting required to enable this?

@Tratcher
Copy link
Member

Tratcher commented Dec 8, 2018

Comments on closed issues are not tracked, please open a new issue with the details for your scenario.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests